Using Protobuf Descriptors for Schema Introspection: A Complete Guide

Protocol Buffers descriptors provide a runtime reflection API that enables programs to inspect message schemas, enumerate fields, and manipulate data dynamically without requiring access to the original .proto source files.

The protocolbuffers/protobuf repository implements a comprehensive descriptor system that embeds schema metadata directly into generated code. This system powers schema introspection capabilities ranging from simple field enumeration to constructing fully dynamic messages at runtime using the DescriptorPool and DynamicMessageFactory APIs.

What Are Protobuf Descriptors?

Descriptors are runtime representations of Protocol Buffers schema definitions. When you compile a .proto file using protoc, the generated code includes binary metadata describing every message, field, enum, and service. This metadata registers automatically with a global DescriptorPool at program initialization via static initializers that call AddGeneratedFile (implemented in src/google/protobuf/descriptor.cc).

Unlike static code generation, descriptors allow applications to discover schema layouts dynamically. You can enumerate fields by number, inspect type information, check field labels (optional versus repeated), and construct new message instances whose types were unknown at compile time.

Core Components of the Descriptor System

The introspection architecture centers on several key classes defined in src/google/protobuf/descriptor.h:

DescriptorPool

The DescriptorPool acts as a central registry that holds all FileDescriptor objects for a process. It deduplicates descriptors and resolves cross-file references. Access the global pool containing compiled-in descriptors via DescriptorPool::generated_pool().

FileDescriptor

Represents a single .proto file, including its package declaration, syntax version, dependencies, and the top-level messages and enums it defines. File descriptors form the building blocks that the pool assembles into a complete schema graph.

Descriptor (MessageDescriptor)

Describes a concrete message type, providing its fully qualified name, list of fields, nested types, and oneof definitions. Obtain a message descriptor by calling FindMessageTypeByName() on a descriptor pool.

FieldDescriptor

Describes individual fields with complete metadata: name, field number, type (as a FieldDescriptor::Type enum), label (optional, required, or repeated), default values, and custom options. Access field descriptors via Descriptor::field(index) or Descriptor::FindFieldByName().

DynamicMessageFactory

Located in src/google/protobuf/dynamic_message.h, this factory creates concrete Message subclass instances at runtime from a Descriptor. This enables parsing and serialization of messages whose schemas are only known via descriptors, not static code.

Reflection API

Every generated message implements the GetReflection() method (declared in src/google/protobuf/message.h). The returned Reflection object provides type-safe field access by name or number using descriptor objects, enabling generic algorithms that operate on any message type.

Runtime Schema Introspection with DescriptorPool

At program startup, generated code automatically registers its file descriptors with the global DescriptorPool::generated_pool(). Applications can query this pool to retrieve schema information without parsing .proto files manually.

To find a message type dynamically:

const google::protobuf::Descriptor* desc =
    google::protobuf::DescriptorPool::generated_pool()
        ->FindMessageTypeByName("my.package.Person");

Once you have a Descriptor, you can enumerate its structure:

for (int i = 0; i < desc->field_count(); ++i) {
  const google::protobuf::FieldDescriptor* field = desc->field(i);
  // Access field->name(), field->number(), field->type_name()
}

For schemas not compiled into the binary, load a FileDescriptorSet (serialized by protoc --descriptor_set_out) and build a temporary pool:

google::protobuf::FileDescriptorSet fd_set;
// ... parse from file ...
google::protobuf::DescriptorPool pool;
for (int i = 0; i < fd_set.file_size(); ++i) {
  pool.BuildFile(fd_set.file(i));
}

Practical Code Examples

Inspecting Message Schemas at Runtime

This example demonstrates enumerating fields and manipulating messages via reflection using the core descriptor API:

#include <iostream>
#include <google/protobuf/descriptor.h>
#include <google/protobuf/message.h>
#include <google/protobuf/dynamic_message.h>

int main() {
  // Query the generated pool for a specific message type
  const google::protobuf::Descriptor* desc =
      google::protobuf::DescriptorPool::generated_pool()
          ->FindMessageTypeByName("my.package.Person");

  if (!desc) {
    std::cerr << "Message type not found.\n";
    return 1;
  }

  // Iterate over all fields in the message
  std::cout << "Fields of " << desc->full_name() << ":\n";
  for (int i = 0; i < desc->field_count(); ++i) {
    const auto* field = desc->field(i);
    std::cout << "  " << field->number() << ": " << field->name()
              << " (" << field->type_name() << ")\n";
  }

  // Create a dynamic instance using the descriptor
  google::protobuf::DynamicMessageFactory factory;
  const google::protobuf::Message* prototype = factory.GetPrototype(desc);
  std::unique_ptr<google::protobuf::Message> msg(prototype->New());

  // Modify fields via reflection
  const google::protobuf::FieldDescriptor* name_field =
      desc->FindFieldByName("name");
  const google::protobuf::Reflection* refl = msg->GetReflection();
  refl->SetString(msg.get(), name_field, "Alice");

  // Serialize to verify structure
  std::string binary;
  msg->SerializeToString(&binary);
  std::cout << "Serialized size: " << binary.size() << " bytes\n";

  return 0;
}

Loading External Schema Definitions

For plugin development or schema registry integration, load descriptors from serialized wire format using FileDescriptorProto (defined in src/google/protobuf/descriptor.pb.h):

#include <fstream>
#include <google/protobuf/descriptor.pb.h>
#include <google/protobuf/descriptor.h>
#include <google/protobuf/dynamic_message.h>

int main() {
  // Read descriptor set produced by protoc --descriptor_set_out
  std::ifstream in("my_schema.pb", std::ios::binary);
  google::protobuf::FileDescriptorSet fd_set;
  fd_set.ParseFromIstream(&in);

  // Build temporary descriptor pool
  google::protobuf::DescriptorPool pool;
  for (int i = 0; i < fd_set.file_size(); ++i) {
    pool.BuildFile(fd_set.file(i));
  }

  // Use the pool like the generated one
  const auto* msg_desc = pool.FindMessageTypeByName("my.package.Event");
  if (!msg_desc) return 1;

  // Create dynamic message from external schema
  google::protobuf::DynamicMessageFactory factory(&pool);
  std::unique_ptr<google::protobuf::Message> msg(
      factory.GetPrototype(msg_desc)->New());

  // Message is now ready for parsing binary protobuf data
}

Converting to Upb MiniTables

The lightweight upb library supports descriptor-based introspection through conversion helpers. Use upb_generator::AddFile to populate upb definitions from C++ descriptors:

#include "upb_generator/common/cpp_to_upb_def.h"
#include "upb/util/def_to_proto.h"
#include <google/protobuf/descriptor.h>

void ConvertToUpb(const google::protobuf::FileDescriptor* file) {
  upb::DefPool pool;
  // Convert FileDescriptor to upb definitions
  upb_generator::AddFile(file, &pool);

  // Retrieve MiniTable for specific message
  const upb_MessageDef* upb_msg = upb_MessageDef_FindByName(
      pool.upb_symtab(), "my.package.Person");
  
  // Convert back to protobuf descriptor proto for debugging
  google_protobuf_DescriptorProto* proto =
      upb_MessageDef_ToProto(upb_msg, nullptr);
}

This conversion path (defined in upb_generator/common/cpp_to_upb_def.h and upb/util/def_to_proto.h) bridges the full protobuf runtime with upb's minimal memory footprint, enabling tools that leverage rich descriptor information without the full C++ runtime overhead.

How Descriptors Power the Ecosystem

The descriptor system enables critical tooling throughout the Protocol Buffers ecosystem. Protoc plugins receive a FileDescriptorSet over stdin, parse it into a temporary DescriptorPool, and use the reflection API to generate code for languages like Python, Java, or Go. The upb_generator/plugin.cc file demonstrates this pattern, parsing descriptors to generate upb MiniTables.

Schema registries serialize FileDescriptorProto messages (defined in src/google/protobuf/descriptor.pb.h) to transmit schemas across network boundaries. The wire-format representation allows services to validate messages against schemas stored remotely, enabling dynamic type checking in data pipelines.

Summary

  • Descriptors embed complete schema metadata into protobuf binaries, enabling runtime introspection without .proto files.
  • DescriptorPool manages descriptor lifecycle and provides lookup methods like FindMessageTypeByName().
  • DynamicMessageFactory (in src/google/protobuf/dynamic_message.h) creates message instances from descriptors at runtime, supporting generic parsing and serialization.
  • Reflection API (via Message::GetReflection()) allows reading and writing fields using descriptor metadata rather than generated accessors.
  • FileDescriptorSet provides a wire-format container for transmitting schemas between processes, powering plugins and schema registries.
  • Upb integration (via upb_generator/common/cpp_to_upb_def.h) allows converting C++ descriptors to lightweight upb MiniTables for resource-constrained environments.

Frequently Asked Questions

How do I access protobuf descriptors without the original .proto files?

Use google::protobuf::DescriptorPool::generated_pool()->FindMessageTypeByName() to retrieve message descriptors from code generated by protoc. The generated code embeds binary descriptors that register automatically at program startup via AddGeneratedFile in src/google/protobuf/descriptor.cc. No .proto files are needed at runtime.

What is the difference between Descriptor and DescriptorProto?

Descriptor (and FileDescriptor, FieldDescriptor, etc.) are C++ classes in src/google/protobuf/descriptor.h that provide an object-oriented API for introspection. DescriptorProto (and FileDescriptorProto) are generated protobuf message types defined in src/google/protobuf/descriptor.pb.h that represent the wire-format serialization of descriptors. You can convert between them: descriptors describe runtime behavior, while protos allow transmission and storage.

Can I modify protobuf messages dynamically using descriptors?

Yes. Create a DynamicMessageFactory and call GetPrototype(descriptor)->New() to instantiate a message type known only at runtime. Use Message::GetReflection() to obtain a Reflection object, then call methods like SetString(), SetInt32(), or MutableMessage() to modify fields using their FieldDescriptor objects. This pattern supports generic transformation pipelines that operate on arbitrary message types.

How do protoc plugins use descriptors?

Plugins receive a FileDescriptorSet (a collection of FileDescriptorProto messages) via stdin from the compiler. They parse this using FileDescriptorSet::ParseFromIstream(), build a DescriptorPool via BuildFile(), and then use the resulting descriptors to inspect schemas and generate code. This architecture allows plugins to access complete type information without parsing .proto files themselves, as implemented in upb_generator/plugin.cc.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →