deep-dive

Protobuf Packed vs Unpacked Encoding for Repeated Fields: A Complete Guide

March 2, 2026 protocolbuffers/protobuf ↗

Repeated scalar fields in Protocol Buffers can be encoded either as individual tag-value pairs (unpacked) or as a single length-delimited block (packed), with packed encoding reducing message size by eliminating per-element tags and improving parsing performance.

Protocol Buffers (protobuf) offers two distinct wire formats for encoding repeated scalar fields. Understanding the differences between protobuf packed vs unpacked encoding for repeated fields is essential for optimizing message size and deserialization performance in high-throughput systems. This guide examines the implementation details in the protocolbuffers/protobuf repository, covering wire format internals, compatibility guarantees, and practical C++ examples.

What Packed and Unpacked Encoding Mean

In protobuf wire format, a repeated field serializes each element sequentially. The encoding method determines whether each element carries its own tag or shares a single tag as a block.

Wire Format Differences

Encoding	Wire Type	Layout
Unpacked	`VARINT`, `FIXED32`, `FIXED64`, or `LEN-DELIMITED` (depends on scalar type)	Each element is written as a full tag + value pair.
Packed	`LEN-DELIMITED`	All elements are concatenated into a single length-delimited block; the tag is written once, followed by the total byte count and the raw values.

The packed form reduces message size and improves parsing speed for large repeated numeric fields because it eliminates the 1-byte tag overhead per element.

Which Types Support Packing

Packed encoding is only allowed for numeric scalar types: int32, int64, uint32, uint64, sint32, sint64, bool, enum, fixed32, fixed64, sfixed32, sfixed64, float, and double.

Message-type repeated fields can never be packed because each sub-message requires its own tag to delimit its length.

How the Protobuf Library Chooses the Encoding

The decision between packed and unpacked happens at three layers: descriptor inspection, serialization, and deserialization.

Descriptor Level

The FieldDescriptor::is_packed() accessor in src/google/protobuf/descriptor.cc (around line 4248) tells the runtime whether the field was declared with the packed=true option or the proto3 default.

// Conceptual representation from descriptor.cc
bool FieldDescriptor::is_packed() const {
  // Returns true if packed option is set or proto3 default applies
  return internal::cpp::IsFieldPacked(this);
}

Serialization Logic

When WireFormat::SerializeWithCachedSizes (or the low-level WireFormatLite) processes a repeated field, it checks field->is_packed() to select the code path. The implementation in src/google/protobuf/wire_format.cc (lines 1269-1295) handles the packed case:

if (field->is_packed()) {
  // Write a length-delimited block with all values
  target = stream->Write##TYPE_METHOD##Packed(...);
}

For packed fields, the serializer concatenates all primitive values into a contiguous byte array, prefixes it with the field tag and total length, and writes it as a single LEN-DELIMITED record.

Deserialization Logic

During parsing, the generic decoder in WireFormat examines the wire-type. If it encounters a length-delimited field where a packed field is expected, it forwards the payload to the packed-reader helpers in src/google/protobuf/wire_format_lite.h (e.g., ReadPackedPrimitive around lines 306-311):

template <typename CType, enum FieldType DeclaredType>
inline bool WireFormatLite::ReadPackedPrimitive(
    io::CodedInputStream* input, RepeatedField<CType>* values) {
  // Reads length-delimited block and parses each element
}

The parser also tolerates the unpacked representation for packed fields, enabling forward- and backward-compatible parsing across different protobuf versions.

Wire Compatibility Between Packed and Unpacked

Protobuf guarantees that packed and unpacked encodings are mutually compatible. You can upgrade a field from unpacked to packed (or vice versa) without breaking wire compatibility.

Packed to Unpacked Parsing

A message serialized with a packed field can be parsed by a decoder expecting unpacked fields. The parser detects the length-delimited payload, enters the packed-reading loop, and extracts each element individually. This behavior is exercised in the unit test ParsePackedFromUnpacked in src/google/protobuf/wire_format_unittest.h (lines 1314-1327).

Unpacked to Packed Parsing

Conversely, a decoder expecting a packed field will accept the unpacked representation. The parser reads one tag/value pair at a time and appends each element to the repeated field. This is tested by ParseUnpackedFromPacked in src/google/protobuf/wire_format_unittest.h (lines 1429-1442).

Thus, both encodings are wire-compatible; the only difference is the on-the-wire size and parsing efficiency.

When to Use Packed Encoding

Proto3 Defaults

In proto3, repeated scalar fields default to packed encoding automatically. You do not need to specify any options to get the space-saving benefits.

Proto2 Explicit Configuration

In proto2, repeated fields default to unpacked encoding. You must explicitly add the [packed = true] option to enable the packed format:

repeated int32 values = 1 [packed = true];

Use packed encoding for any repeated scalar field containing numeric types, especially when the field typically contains many elements.

Performance Impact

Message Size Reduction

A packed field stores each value as its raw binary encoding without per-element tags, cutting overhead by roughly 1 byte per element (the tag) plus the length delimiter for the whole block. For a repeated field containing 1,000 integers, this can save over 1 KB per message.

Parsing Speed

The parser can copy the raw bytes straight into the RepeatedField buffer, avoiding per-element tag checks and branch mispredictions. This vectorized approach significantly outperforms the unpacked loop for large collections.

Code Examples

Proto Definition

syntax = "proto3";

message Sample {
  // Unpacked (explicitly disabled)
  repeated int32 values_unpacked = 1 [packed = false];

  // Packed (default in proto3, explicit in proto2)
  repeated int32 values_packed = 2 [packed = true];
}

C++ Serialization and Wire Inspection

#include "sample.pb.h"
#include <iostream>
#include <iomanip>

int main() {
  Sample msg;
  msg.add_values_unpacked(10);
  msg.add_values_unpacked(20);
  msg.add_values_packed(10);
  msg.add_values_packed(20);

  std::string data;
  msg.SerializeToString(&data);

  // Hex-dump the raw bytes
  for (unsigned char c : data) {
    std::cout << std::hex << std::setw(2) << std::setfill('0')
              << static_cast<int>(c) << ' ';
  }
  std::cout << std::dec << '\n';
}

Output (hex)


08 0a 08 14   // field 1 (unpacked): tag 0x08, value 10; tag 0x08, value 20
12 04 0a 14   // field 2 (packed): tag 0x12, length 0x04, values 0a 14

Explanation

0x08 = (field 1 << 3) | VARINT → unpacked tag repeated twice.
0x12 = (field 2 << 3) | LEN-DELIMITED → packed block of two varints (0a 14).

The packed block size (0x04) is the total byte count of the two encoded varints.

C++ Parsing Both Forms

Sample decoded;

// ---- Parse packed data into an *unpacked* field ----
{
  std::string packed_data = "\x12\x04\x0a\x14";  // same as above
  decoded.ParseFromString(packed_data);
  // decoded.values_unpacked() will contain {10, 20}
}

// ---- Parse unpacked data into a *packed* field ----
{
  std::string unpacked_data = "\x08\x0a\x08\x14";  // same as above
  decoded.ParseFromString(unpacked_data);
  // decoded.values_packed() will contain {10, 20}
}

Both calls succeed because the parser accepts the cross-representation shown in the unit tests.

Summary

Protobuf packed vs unpacked encoding for repeated fields determines whether each scalar element carries its own tag (unpacked) or shares a single length-delimited block (packed).
Packed encoding is only available for numeric scalar types (integers, floats, booleans, enums) and reduces message size by roughly one byte per element.
Proto3 defaults to packed for repeated scalars, while proto2 defaults to unpacked unless explicitly configured.
The FieldDescriptor::is_packed() method in src/google/protobuf/descriptor.cc drives the encoding decision, while src/google/protobuf/wire_format.cc and src/google/protobuf/wire_format_lite.h handle the actual serialization and parsing logic.
Wire compatibility is guaranteed: parsers accept both packed and unpacked data regardless of the field's declared packing status, as verified by ParsePackedFromUnpacked and ParseUnpackedFromPacked unit tests.

Frequently Asked Questions

What is the difference between packed and unpacked repeated fields in Protobuf?

Unpacked encoding writes each element as a separate tag-value pair on the wire, while packed encoding concatenates all elements into a single length-delimited block with one shared tag. Packed encoding eliminates the per-element tag overhead (roughly 1 byte per element) and allows the parser to read the entire array as a contiguous buffer, improving both message size and parsing speed for numeric scalar types.

Does proto3 use packed encoding by default for repeated fields?

Yes. In proto3, repeated scalar fields automatically use packed encoding as the default behavior. You do not need to specify any options to obtain the space-saving benefits. In contrast, proto2 defaults to unpacked encoding for repeated fields, requiring you to explicitly set [packed = true] on the field definition to enable the packed format.

Are packed and unpacked repeated fields wire-compatible?

Yes, packed and unpacked encodings are fully wire-compatible. A parser expecting packed data can successfully parse unpacked data (reading individual tag-value pairs), and a parser expecting unpacked data can parse packed data (reading the length-delimited block and iterating through its contents). This cross-compatibility is enforced by unit tests ParsePackedFromUnpacked and ParseUnpackedFromPacked in src/google/protobuf/wire_format_unittest.h, allowing you to change the packing option without breaking existing deployments.

When should I avoid using packed encoding for repeated fields?

Avoid packed encoding when you are using proto2 and need compatibility with protobuf implementations prior to version 2.1.0 (released in 2008), which do not recognize the packed wire format. Additionally, do not use packed encoding for repeated fields containing message types (sub-messages), as packing is only supported for numeric scalar types (integers, floats, booleans, and enums). For all other repeated scalar fields, especially those with many elements, packed encoding is recommended.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how protocolbuffers/protobuf works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →