Using Protobuf Descriptors for Schema Introspection: A Complete Guide
Protocol Buffers descriptors provide a runtime reflection API that enables programs to inspect message schemas, enumerate fields, and manipulate data dynamically without requiring access to the original .proto source files.
The protocolbuffers/protobuf repository implements a comprehensive descriptor system that embeds schema metadata directly into generated code. This system powers schema introspection capabilities ranging from simple field enumeration to constructing fully dynamic messages at runtime using the DescriptorPool and DynamicMessageFactory APIs.
What Are Protobuf Descriptors?
Descriptors are runtime representations of Protocol Buffers schema definitions. When you compile a .proto file using protoc, the generated code includes binary metadata describing every message, field, enum, and service. This metadata registers automatically with a global DescriptorPool at program initialization via static initializers that call AddGeneratedFile (implemented in src/google/protobuf/descriptor.cc).
Unlike static code generation, descriptors allow applications to discover schema layouts dynamically. You can enumerate fields by number, inspect type information, check field labels (optional versus repeated), and construct new message instances whose types were unknown at compile time.
Core Components of the Descriptor System
The introspection architecture centers on several key classes defined in src/google/protobuf/descriptor.h:
DescriptorPool
The DescriptorPool acts as a central registry that holds all FileDescriptor objects for a process. It deduplicates descriptors and resolves cross-file references. Access the global pool containing compiled-in descriptors via DescriptorPool::generated_pool().
FileDescriptor
Represents a single .proto file, including its package declaration, syntax version, dependencies, and the top-level messages and enums it defines. File descriptors form the building blocks that the pool assembles into a complete schema graph.
Descriptor (MessageDescriptor)
Describes a concrete message type, providing its fully qualified name, list of fields, nested types, and oneof definitions. Obtain a message descriptor by calling FindMessageTypeByName() on a descriptor pool.
FieldDescriptor
Describes individual fields with complete metadata: name, field number, type (as a FieldDescriptor::Type enum), label (optional, required, or repeated), default values, and custom options. Access field descriptors via Descriptor::field(index) or Descriptor::FindFieldByName().
DynamicMessageFactory
Located in src/google/protobuf/dynamic_message.h, this factory creates concrete Message subclass instances at runtime from a Descriptor. This enables parsing and serialization of messages whose schemas are only known via descriptors, not static code.
Reflection API
Every generated message implements the GetReflection() method (declared in src/google/protobuf/message.h). The returned Reflection object provides type-safe field access by name or number using descriptor objects, enabling generic algorithms that operate on any message type.
Runtime Schema Introspection with DescriptorPool
At program startup, generated code automatically registers its file descriptors with the global DescriptorPool::generated_pool(). Applications can query this pool to retrieve schema information without parsing .proto files manually.
To find a message type dynamically:
const google::protobuf::Descriptor* desc =
google::protobuf::DescriptorPool::generated_pool()
->FindMessageTypeByName("my.package.Person");
Once you have a Descriptor, you can enumerate its structure:
for (int i = 0; i < desc->field_count(); ++i) {
const google::protobuf::FieldDescriptor* field = desc->field(i);
// Access field->name(), field->number(), field->type_name()
}
For schemas not compiled into the binary, load a FileDescriptorSet (serialized by protoc --descriptor_set_out) and build a temporary pool:
google::protobuf::FileDescriptorSet fd_set;
// ... parse from file ...
google::protobuf::DescriptorPool pool;
for (int i = 0; i < fd_set.file_size(); ++i) {
pool.BuildFile(fd_set.file(i));
}
Practical Code Examples
Inspecting Message Schemas at Runtime
This example demonstrates enumerating fields and manipulating messages via reflection using the core descriptor API:
#include <iostream>
#include <google/protobuf/descriptor.h>
#include <google/protobuf/message.h>
#include <google/protobuf/dynamic_message.h>
int main() {
// Query the generated pool for a specific message type
const google::protobuf::Descriptor* desc =
google::protobuf::DescriptorPool::generated_pool()
->FindMessageTypeByName("my.package.Person");
if (!desc) {
std::cerr << "Message type not found.\n";
return 1;
}
// Iterate over all fields in the message
std::cout << "Fields of " << desc->full_name() << ":\n";
for (int i = 0; i < desc->field_count(); ++i) {
const auto* field = desc->field(i);
std::cout << " " << field->number() << ": " << field->name()
<< " (" << field->type_name() << ")\n";
}
// Create a dynamic instance using the descriptor
google::protobuf::DynamicMessageFactory factory;
const google::protobuf::Message* prototype = factory.GetPrototype(desc);
std::unique_ptr<google::protobuf::Message> msg(prototype->New());
// Modify fields via reflection
const google::protobuf::FieldDescriptor* name_field =
desc->FindFieldByName("name");
const google::protobuf::Reflection* refl = msg->GetReflection();
refl->SetString(msg.get(), name_field, "Alice");
// Serialize to verify structure
std::string binary;
msg->SerializeToString(&binary);
std::cout << "Serialized size: " << binary.size() << " bytes\n";
return 0;
}
Loading External Schema Definitions
For plugin development or schema registry integration, load descriptors from serialized wire format using FileDescriptorProto (defined in src/google/protobuf/descriptor.pb.h):
#include <fstream>
#include <google/protobuf/descriptor.pb.h>
#include <google/protobuf/descriptor.h>
#include <google/protobuf/dynamic_message.h>
int main() {
// Read descriptor set produced by protoc --descriptor_set_out
std::ifstream in("my_schema.pb", std::ios::binary);
google::protobuf::FileDescriptorSet fd_set;
fd_set.ParseFromIstream(&in);
// Build temporary descriptor pool
google::protobuf::DescriptorPool pool;
for (int i = 0; i < fd_set.file_size(); ++i) {
pool.BuildFile(fd_set.file(i));
}
// Use the pool like the generated one
const auto* msg_desc = pool.FindMessageTypeByName("my.package.Event");
if (!msg_desc) return 1;
// Create dynamic message from external schema
google::protobuf::DynamicMessageFactory factory(&pool);
std::unique_ptr<google::protobuf::Message> msg(
factory.GetPrototype(msg_desc)->New());
// Message is now ready for parsing binary protobuf data
}
Converting to Upb MiniTables
The lightweight upb library supports descriptor-based introspection through conversion helpers. Use upb_generator::AddFile to populate upb definitions from C++ descriptors:
#include "upb_generator/common/cpp_to_upb_def.h"
#include "upb/util/def_to_proto.h"
#include <google/protobuf/descriptor.h>
void ConvertToUpb(const google::protobuf::FileDescriptor* file) {
upb::DefPool pool;
// Convert FileDescriptor to upb definitions
upb_generator::AddFile(file, &pool);
// Retrieve MiniTable for specific message
const upb_MessageDef* upb_msg = upb_MessageDef_FindByName(
pool.upb_symtab(), "my.package.Person");
// Convert back to protobuf descriptor proto for debugging
google_protobuf_DescriptorProto* proto =
upb_MessageDef_ToProto(upb_msg, nullptr);
}
This conversion path (defined in upb_generator/common/cpp_to_upb_def.h and upb/util/def_to_proto.h) bridges the full protobuf runtime with upb's minimal memory footprint, enabling tools that leverage rich descriptor information without the full C++ runtime overhead.
How Descriptors Power the Ecosystem
The descriptor system enables critical tooling throughout the Protocol Buffers ecosystem. Protoc plugins receive a FileDescriptorSet over stdin, parse it into a temporary DescriptorPool, and use the reflection API to generate code for languages like Python, Java, or Go. The upb_generator/plugin.cc file demonstrates this pattern, parsing descriptors to generate upb MiniTables.
Schema registries serialize FileDescriptorProto messages (defined in src/google/protobuf/descriptor.pb.h) to transmit schemas across network boundaries. The wire-format representation allows services to validate messages against schemas stored remotely, enabling dynamic type checking in data pipelines.
Summary
- Descriptors embed complete schema metadata into protobuf binaries, enabling runtime introspection without
.protofiles. - DescriptorPool manages descriptor lifecycle and provides lookup methods like
FindMessageTypeByName(). - DynamicMessageFactory (in
src/google/protobuf/dynamic_message.h) creates message instances from descriptors at runtime, supporting generic parsing and serialization. - Reflection API (via
Message::GetReflection()) allows reading and writing fields using descriptor metadata rather than generated accessors. - FileDescriptorSet provides a wire-format container for transmitting schemas between processes, powering plugins and schema registries.
- Upb integration (via
upb_generator/common/cpp_to_upb_def.h) allows converting C++ descriptors to lightweight upb MiniTables for resource-constrained environments.
Frequently Asked Questions
How do I access protobuf descriptors without the original .proto files?
Use google::protobuf::DescriptorPool::generated_pool()->FindMessageTypeByName() to retrieve message descriptors from code generated by protoc. The generated code embeds binary descriptors that register automatically at program startup via AddGeneratedFile in src/google/protobuf/descriptor.cc. No .proto files are needed at runtime.
What is the difference between Descriptor and DescriptorProto?
Descriptor (and FileDescriptor, FieldDescriptor, etc.) are C++ classes in src/google/protobuf/descriptor.h that provide an object-oriented API for introspection. DescriptorProto (and FileDescriptorProto) are generated protobuf message types defined in src/google/protobuf/descriptor.pb.h that represent the wire-format serialization of descriptors. You can convert between them: descriptors describe runtime behavior, while protos allow transmission and storage.
Can I modify protobuf messages dynamically using descriptors?
Yes. Create a DynamicMessageFactory and call GetPrototype(descriptor)->New() to instantiate a message type known only at runtime. Use Message::GetReflection() to obtain a Reflection object, then call methods like SetString(), SetInt32(), or MutableMessage() to modify fields using their FieldDescriptor objects. This pattern supports generic transformation pipelines that operate on arbitrary message types.
How do protoc plugins use descriptors?
Plugins receive a FileDescriptorSet (a collection of FileDescriptorProto messages) via stdin from the compiler. They parse this using FileDescriptorSet::ParseFromIstream(), build a DescriptorPool via BuildFile(), and then use the resulting descriptors to inspect schemas and generate code. This architecture allows plugins to access complete type information without parsing .proto files themselves, as implemented in upb_generator/plugin.cc.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →