Integrating OpenTelemetry for Custom Tracing in Agent-Lightning

Agent-Lightning abstracts OpenTelemetry instrumentation into a high-level wrapper at agentlightning/utils/otel.py, letting you create custom spans, tags, and links without managing SDK providers or attribute sanitization manually.

Agent-Lightning ships with a self-contained tracing module that streamlines observability for AI agent workflows. Rather than instantiating OpenTelemetry objects directly, developers leverage helper functions that handle tracer lifecycle, attribute flattening, and OTLP export automatically. This guide demonstrates how to integrate OpenTelemetry for custom tracing in agent-lightning using the actual APIs from the microsoft/agent-lightning repository.

Initializing the Tracer Provider

Before creating spans, fetch the configured tracer through the framework's lazy provider mechanism. In agentlightning/utils/otel.py, the get_tracer_provider() function instantiates the tracer provider on first call (line 58), while get_tracer() retrieves the active tracer instance (line 145).

from agentlightning.utils.otel import get_tracer

tracer = get_tracer()  # Uses active span processor by default

# Or bypass the active processor:

tracer = get_tracer(use_active_span_processor=False)

The module maintains internal state through get_span_processors() (line 126), which inspects the current processor chain without exposing low-level SDK details to your application code.

Creating Spans with Custom Tags

To annotate spans with searchable metadata, use the make_tag_attributes() helper (line 190) to convert string lists into OTEL-compatible attribute dictionaries.

from agentlightning.utils.otel import make_tag_attributes

with tracer.start_as_current_span(
    "process_user_request",
    attributes=make_tag_attributes(["agent:customer_support", "priority:high"])
) as span:
    # Business logic here

    span.set_attribute("user_id", user_id)

This approach ensures tags conform to the framework's attribute schema defined in agentlightning/types/tracer.py, preventing type mismatches during export.

Linking Distributed Traces

For workflows spanning multiple services or asynchronous boundaries, make_link_attributes() (line 212) serializes correlation contexts into transport-friendly maps. The companion extract_links_from_attributes() reconstructs these links on the consumer side.

from agentlightning.utils.otel import make_link_attributes

# Inside your span context

correlation_context = {"parent_span_id": span_id, "trace_id": trace_id}
span.set_attribute("links", make_link_attributes(correlation_context))

The link model is defined in agentlightning/types/resources.py as a Pydantic model, ensuring type safety across the serialization boundary.

Sanitizing and Flattening Attributes

Arbitrary Python objects often fail OTEL export validation. The wrapper provides sanitize_attributes() (line 462) and sanitize_attribute_value() to recursively convert complex types into primitive, export-safe values.

from agentlightning.utils.otel import sanitize_attributes

nested_data = {"config": model_config, "metrics": live_metrics}
span.set_attributes(sanitize_attributes(nested_data))

For deeply nested dictionaries that must traverse network boundaries, flatten_attributes() (line 327) converts hierarchical structures into dot-notation keys, with unflatten_attributes() available for reconstruction.

Configuring OTLP Export

Framework-level export logic resides in agentlightning/utils/otlp.py. The OtelOTLPExporter class manages endpoint configuration and optional filtering through should_bypass().

from agentlightning.utils.otlp import OtelOTLPExporter

exporter = OtelOTLPExporter(endpoint="http://localhost:4317")
exporter.enable_store_otlp(
    endpoint="http://localhost:4317",
    rollout_id="experiment-42",
    attempt_id="run-001"
)

The handle_otlp_export() function (line 56) handles the low-level protobuf conversion and retry logic, while enable_store_otlp() configures the exporter with rollout-specific metadata for experiment tracking.

Key Implementation Files

Unit tests demonstrating these APIs live in tests/tracer/test_otel.py, while end-to-end integration examples are available in tests/tracer/test_integration.py.

Summary

  • Use get_tracer() from agentlightning/utils/otel.py to obtain a preconfigured tracer without instantiating the SDK directly.
  • Tag spans with make_tag_attributes() and link distributed traces via make_link_attributes() to maintain correlation across service boundaries.
  • Sanitize arbitrary data through sanitize_attributes() before attaching to spans, ensuring OTEL-compatible primitive types.
  • Export traces using OtelOTLPExporter in agentlightning/utils/otlp.py, which supports runtime enablement and experiment-scoped metadata.
  • Reference tests in tests/tracer/ for working examples of custom instrumentation patterns.

Frequently Asked Questions

How do I test custom traces without exporting to a live collector?

Configure the tracer provider with an in-memory span processor during test setup. The tests/tracer/test_otel.py file demonstrates how to capture spans locally using the framework's testing utilities, allowing you to assert on span attributes and tags without network calls.

What happens if I pass unsupported data types to span attributes?

The sanitize_attributes() function in agentlightning/utils/otel.py recursively converts complex objects—such as dictionaries, lists, or custom classes—into JSON-serializable primitives. Values that cannot be serialized are converted to strings or filtered out, preventing export failures while preserving diagnostic context.

Can I use a custom OTLP endpoint per experiment?

Yes. The OtelOTLPExporter.enable_store_otlp() method accepts per-call endpoint, rollout_id, and attempt_id parameters. This design allows you to route traces from different experiments to distinct collectors or tag them with specific metadata for A/B testing analysis.

Does Agent-Lightning support baggage propagation across async boundaries?

While the wrapper provides make_link_attributes() for explicit span linking, standard OpenTelemetry baggage propagation works through the underlying SDK context. For async workflows, ensure you attach the appropriate context carriers when crossing thread or process boundaries, then extract links using extract_links_from_attributes() on the consumer side.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →