Protobuf Packed vs Unpacked Encoding for Repeated Fields: A Complete Guide
Repeated scalar fields in Protocol Buffers can be encoded either as individual tag-value pairs (unpacked) or as a single length-delimited block (packed), with packed encoding reducing message size by eliminating per-element tags and improving parsing performance.
Protocol Buffers (protobuf) offers two distinct wire formats for encoding repeated scalar fields. Understanding the differences between protobuf packed vs unpacked encoding for repeated fields is essential for optimizing message size and deserialization performance in high-throughput systems. This guide examines the implementation details in the protocolbuffers/protobuf repository, covering wire format internals, compatibility guarantees, and practical C++ examples.
What Packed and Unpacked Encoding Mean
In protobuf wire format, a repeated field serializes each element sequentially. The encoding method determines whether each element carries its own tag or shares a single tag as a block.
Wire Format Differences
| Encoding | Wire Type | Layout |
|---|---|---|
| Unpacked | VARINT, FIXED32, FIXED64, or LEN-DELIMITED (depends on scalar type) |
Each element is written as a full tag + value pair. |
| Packed | LEN-DELIMITED |
All elements are concatenated into a single length-delimited block; the tag is written once, followed by the total byte count and the raw values. |
The packed form reduces message size and improves parsing speed for large repeated numeric fields because it eliminates the 1-byte tag overhead per element.
Which Types Support Packing
Packed encoding is only allowed for numeric scalar types: int32, int64, uint32, uint64, sint32, sint64, bool, enum, fixed32, fixed64, sfixed32, sfixed64, float, and double.
Message-type repeated fields can never be packed because each sub-message requires its own tag to delimit its length.
How the Protobuf Library Chooses the Encoding
The decision between packed and unpacked happens at three layers: descriptor inspection, serialization, and deserialization.
Descriptor Level
The FieldDescriptor::is_packed() accessor in src/google/protobuf/descriptor.cc (around line 4248) tells the runtime whether the field was declared with the packed=true option or the proto3 default.
// Conceptual representation from descriptor.cc
bool FieldDescriptor::is_packed() const {
// Returns true if packed option is set or proto3 default applies
return internal::cpp::IsFieldPacked(this);
}
Serialization Logic
When WireFormat::SerializeWithCachedSizes (or the low-level WireFormatLite) processes a repeated field, it checks field->is_packed() to select the code path. The implementation in src/google/protobuf/wire_format.cc (lines 1269-1295) handles the packed case:
if (field->is_packed()) {
// Write a length-delimited block with all values
target = stream->Write##TYPE_METHOD##Packed(...);
}
For packed fields, the serializer concatenates all primitive values into a contiguous byte array, prefixes it with the field tag and total length, and writes it as a single LEN-DELIMITED record.
Deserialization Logic
During parsing, the generic decoder in WireFormat examines the wire-type. If it encounters a length-delimited field where a packed field is expected, it forwards the payload to the packed-reader helpers in src/google/protobuf/wire_format_lite.h (e.g., ReadPackedPrimitive around lines 306-311):
template <typename CType, enum FieldType DeclaredType>
inline bool WireFormatLite::ReadPackedPrimitive(
io::CodedInputStream* input, RepeatedField<CType>* values) {
// Reads length-delimited block and parses each element
}
The parser also tolerates the unpacked representation for packed fields, enabling forward- and backward-compatible parsing across different protobuf versions.
Wire Compatibility Between Packed and Unpacked
Protobuf guarantees that packed and unpacked encodings are mutually compatible. You can upgrade a field from unpacked to packed (or vice versa) without breaking wire compatibility.
Packed to Unpacked Parsing
A message serialized with a packed field can be parsed by a decoder expecting unpacked fields. The parser detects the length-delimited payload, enters the packed-reading loop, and extracts each element individually. This behavior is exercised in the unit test ParsePackedFromUnpacked in src/google/protobuf/wire_format_unittest.h (lines 1314-1327).
Unpacked to Packed Parsing
Conversely, a decoder expecting a packed field will accept the unpacked representation. The parser reads one tag/value pair at a time and appends each element to the repeated field. This is tested by ParseUnpackedFromPacked in src/google/protobuf/wire_format_unittest.h (lines 1429-1442).
Thus, both encodings are wire-compatible; the only difference is the on-the-wire size and parsing efficiency.
When to Use Packed Encoding
Proto3 Defaults
In proto3, repeated scalar fields default to packed encoding automatically. You do not need to specify any options to get the space-saving benefits.
Proto2 Explicit Configuration
In proto2, repeated fields default to unpacked encoding. You must explicitly add the [packed = true] option to enable the packed format:
repeated int32 values = 1 [packed = true];
Use packed encoding for any repeated scalar field containing numeric types, especially when the field typically contains many elements.
Performance Impact
Message Size Reduction
A packed field stores each value as its raw binary encoding without per-element tags, cutting overhead by roughly 1 byte per element (the tag) plus the length delimiter for the whole block. For a repeated field containing 1,000 integers, this can save over 1 KB per message.
Parsing Speed
The parser can copy the raw bytes straight into the RepeatedField buffer, avoiding per-element tag checks and branch mispredictions. This vectorized approach significantly outperforms the unpacked loop for large collections.
Code Examples
Proto Definition
syntax = "proto3";
message Sample {
// Unpacked (explicitly disabled)
repeated int32 values_unpacked = 1 [packed = false];
// Packed (default in proto3, explicit in proto2)
repeated int32 values_packed = 2 [packed = true];
}
C++ Serialization and Wire Inspection
#include "sample.pb.h"
#include <iostream>
#include <iomanip>
int main() {
Sample msg;
msg.add_values_unpacked(10);
msg.add_values_unpacked(20);
msg.add_values_packed(10);
msg.add_values_packed(20);
std::string data;
msg.SerializeToString(&data);
// Hex-dump the raw bytes
for (unsigned char c : data) {
std::cout << std::hex << std::setw(2) << std::setfill('0')
<< static_cast<int>(c) << ' ';
}
std::cout << std::dec << '\n';
}
Output (hex)
08 0a 08 14 // field 1 (unpacked): tag 0x08, value 10; tag 0x08, value 20
12 04 0a 14 // field 2 (packed): tag 0x12, length 0x04, values 0a 14
Explanation
0x08= (field 1 << 3) | VARINT → unpacked tag repeated twice.0x12= (field 2 << 3) | LEN-DELIMITED → packed block of two varints (0a 14).
The packed block size (0x04) is the total byte count of the two encoded varints.
C++ Parsing Both Forms
Sample decoded;
// ---- Parse packed data into an *unpacked* field ----
{
std::string packed_data = "\x12\x04\x0a\x14"; // same as above
decoded.ParseFromString(packed_data);
// decoded.values_unpacked() will contain {10, 20}
}
// ---- Parse unpacked data into a *packed* field ----
{
std::string unpacked_data = "\x08\x0a\x08\x14"; // same as above
decoded.ParseFromString(unpacked_data);
// decoded.values_packed() will contain {10, 20}
}
Both calls succeed because the parser accepts the cross-representation shown in the unit tests.
Summary
- Protobuf packed vs unpacked encoding for repeated fields determines whether each scalar element carries its own tag (unpacked) or shares a single length-delimited block (packed).
- Packed encoding is only available for numeric scalar types (integers, floats, booleans, enums) and reduces message size by roughly one byte per element.
- Proto3 defaults to packed for repeated scalars, while proto2 defaults to unpacked unless explicitly configured.
- The
FieldDescriptor::is_packed()method insrc/google/protobuf/descriptor.ccdrives the encoding decision, whilesrc/google/protobuf/wire_format.ccandsrc/google/protobuf/wire_format_lite.hhandle the actual serialization and parsing logic. - Wire compatibility is guaranteed: parsers accept both packed and unpacked data regardless of the field's declared packing status, as verified by
ParsePackedFromUnpackedandParseUnpackedFromPackedunit tests.
Frequently Asked Questions
What is the difference between packed and unpacked repeated fields in Protobuf?
Unpacked encoding writes each element as a separate tag-value pair on the wire, while packed encoding concatenates all elements into a single length-delimited block with one shared tag. Packed encoding eliminates the per-element tag overhead (roughly 1 byte per element) and allows the parser to read the entire array as a contiguous buffer, improving both message size and parsing speed for numeric scalar types.
Does proto3 use packed encoding by default for repeated fields?
Yes. In proto3, repeated scalar fields automatically use packed encoding as the default behavior. You do not need to specify any options to obtain the space-saving benefits. In contrast, proto2 defaults to unpacked encoding for repeated fields, requiring you to explicitly set [packed = true] on the field definition to enable the packed format.
Are packed and unpacked repeated fields wire-compatible?
Yes, packed and unpacked encodings are fully wire-compatible. A parser expecting packed data can successfully parse unpacked data (reading individual tag-value pairs), and a parser expecting unpacked data can parse packed data (reading the length-delimited block and iterating through its contents). This cross-compatibility is enforced by unit tests ParsePackedFromUnpacked and ParseUnpackedFromPacked in src/google/protobuf/wire_format_unittest.h, allowing you to change the packing option without breaking existing deployments.
When should I avoid using packed encoding for repeated fields?
Avoid packed encoding when you are using proto2 and need compatibility with protobuf implementations prior to version 2.1.0 (released in 2008), which do not recognize the packed wire format. Additionally, do not use packed encoding for repeated fields containing message types (sub-messages), as packing is only supported for numeric scalar types (integers, floats, booleans, and enums). For all other repeated scalar fields, especially those with many elements, packed encoding is recommended.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →