# Integrating OpenTelemetry for Custom Tracing in Agent-Lightning

> Integrate OpenTelemetry for custom tracing in agent-lightning easily. The library simplifies span, tag, and link creation without manual SDK management. Learn more!

- Repository: [Microsoft/agent-lightning](https://github.com/microsoft/agent-lightning)
- Tags: how-to-guide
- Published: 2026-04-01

---

**Agent-Lightning abstracts OpenTelemetry instrumentation into a high-level wrapper at [`agentlightning/utils/otel.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/utils/otel.py), letting you create custom spans, tags, and links without managing SDK providers or attribute sanitization manually.**

Agent-Lightning ships with a self-contained tracing module that streamlines observability for AI agent workflows. Rather than instantiating OpenTelemetry objects directly, developers leverage helper functions that handle tracer lifecycle, attribute flattening, and OTLP export automatically. This guide demonstrates how to integrate OpenTelemetry for custom tracing in agent-lightning using the actual APIs from the `microsoft/agent-lightning` repository.

## Initializing the Tracer Provider

Before creating spans, fetch the configured tracer through the framework's lazy provider mechanism. In [`agentlightning/utils/otel.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/utils/otel.py), the `get_tracer_provider()` function instantiates the tracer provider on first call (line 58), while `get_tracer()` retrieves the active tracer instance (line 145).

```python
from agentlightning.utils.otel import get_tracer

tracer = get_tracer()  # Uses active span processor by default

# Or bypass the active processor:

tracer = get_tracer(use_active_span_processor=False)

```

The module maintains internal state through `get_span_processors()` (line 126), which inspects the current processor chain without exposing low-level SDK details to your application code.

## Creating Spans with Custom Tags

To annotate spans with searchable metadata, use the `make_tag_attributes()` helper (line 190) to convert string lists into OTEL-compatible attribute dictionaries.

```python
from agentlightning.utils.otel import make_tag_attributes

with tracer.start_as_current_span(
    "process_user_request",
    attributes=make_tag_attributes(["agent:customer_support", "priority:high"])
) as span:
    # Business logic here

    span.set_attribute("user_id", user_id)

```

This approach ensures tags conform to the framework's attribute schema defined in [`agentlightning/types/tracer.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/types/tracer.py), preventing type mismatches during export.

## Linking Distributed Traces

For workflows spanning multiple services or asynchronous boundaries, `make_link_attributes()` (line 212) serializes correlation contexts into transport-friendly maps. The companion `extract_links_from_attributes()` reconstructs these links on the consumer side.

```python
from agentlightning.utils.otel import make_link_attributes

# Inside your span context

correlation_context = {"parent_span_id": span_id, "trace_id": trace_id}
span.set_attribute("links", make_link_attributes(correlation_context))

```

The link model is defined in [`agentlightning/types/resources.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/types/resources.py) as a Pydantic model, ensuring type safety across the serialization boundary.

## Sanitizing and Flattening Attributes

Arbitrary Python objects often fail OTEL export validation. The wrapper provides `sanitize_attributes()` (line 462) and `sanitize_attribute_value()` to recursively convert complex types into primitive, export-safe values.

```python
from agentlightning.utils.otel import sanitize_attributes

nested_data = {"config": model_config, "metrics": live_metrics}
span.set_attributes(sanitize_attributes(nested_data))

```

For deeply nested dictionaries that must traverse network boundaries, `flatten_attributes()` (line 327) converts hierarchical structures into dot-notation keys, with `unflatten_attributes()` available for reconstruction.

## Configuring OTLP Export

Framework-level export logic resides in [`agentlightning/utils/otlp.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/utils/otlp.py). The `OtelOTLPExporter` class manages endpoint configuration and optional filtering through `should_bypass()`.

```python
from agentlightning.utils.otlp import OtelOTLPExporter

exporter = OtelOTLPExporter(endpoint="http://localhost:4317")
exporter.enable_store_otlp(
    endpoint="http://localhost:4317",
    rollout_id="experiment-42",
    attempt_id="run-001"
)

```

The `handle_otlp_export()` function (line 56) handles the low-level protobuf conversion and retry logic, while `enable_store_otlp()` configures the exporter with rollout-specific metadata for experiment tracking.

## Key Implementation Files

- **[`agentlightning/utils/otel.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/utils/otel.py)** – High-level wrapper containing `get_tracer()`, attribute utilities, sanitization, and flattening logic.
- **[`agentlightning/utils/otlp.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/utils/otlp.py)** – OTLP exporter implementation with `handle_otlp_export()` and endpoint management.
- **[`agentlightning/types/tracer.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/types/tracer.py)** – Typed span and resource definitions.
- **[`agentlightning/types/resources.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/types/resources.py)** – Link Pydantic models for cross-span references.

Unit tests demonstrating these APIs live in [`tests/tracer/test_otel.py`](https://github.com/microsoft/agent-lightning/blob/main/tests/tracer/test_otel.py), while end-to-end integration examples are available in [`tests/tracer/test_integration.py`](https://github.com/microsoft/agent-lightning/blob/main/tests/tracer/test_integration.py).

## Summary

- **Use `get_tracer()`** from [`agentlightning/utils/otel.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/utils/otel.py) to obtain a preconfigured tracer without instantiating the SDK directly.
- **Tag spans** with `make_tag_attributes()` and link distributed traces via `make_link_attributes()` to maintain correlation across service boundaries.
- **Sanitize arbitrary data** through `sanitize_attributes()` before attaching to spans, ensuring OTEL-compatible primitive types.
- **Export traces** using `OtelOTLPExporter` in [`agentlightning/utils/otlp.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/utils/otlp.py), which supports runtime enablement and experiment-scoped metadata.
- **Reference tests** in `tests/tracer/` for working examples of custom instrumentation patterns.

## Frequently Asked Questions

### How do I test custom traces without exporting to a live collector?

Configure the tracer provider with an in-memory span processor during test setup. The [`tests/tracer/test_otel.py`](https://github.com/microsoft/agent-lightning/blob/main/tests/tracer/test_otel.py) file demonstrates how to capture spans locally using the framework's testing utilities, allowing you to assert on span attributes and tags without network calls.

### What happens if I pass unsupported data types to span attributes?

The `sanitize_attributes()` function in [`agentlightning/utils/otel.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/utils/otel.py) recursively converts complex objects—such as dictionaries, lists, or custom classes—into JSON-serializable primitives. Values that cannot be serialized are converted to strings or filtered out, preventing export failures while preserving diagnostic context.

### Can I use a custom OTLP endpoint per experiment?

Yes. The `OtelOTLPExporter.enable_store_otlp()` method accepts per-call `endpoint`, `rollout_id`, and `attempt_id` parameters. This design allows you to route traces from different experiments to distinct collectors or tag them with specific metadata for A/B testing analysis.

### Does Agent-Lightning support baggage propagation across async boundaries?

While the wrapper provides `make_link_attributes()` for explicit span linking, standard OpenTelemetry baggage propagation works through the underlying SDK context. For async workflows, ensure you attach the appropriate context carriers when crossing thread or process boundaries, then extract links using `extract_links_from_attributes()` on the consumer side.