# How to Implement a Custom Tracer for Proprietary Frameworks in agent-lightning

> Learn to implement a custom tracer for proprietary frameworks in agent-lightning. Extend the base Tracer class and integrate with LitAgentRunner for distributed tracing.

- Repository: [Microsoft/agent-lightning](https://github.com/microsoft/agent-lightning)
- Tags: how-to-guide
- Published: 2026-04-01

---

**Implement a custom tracer by subclassing the abstract `Tracer` class defined in [`agentlightning/tracer/base.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/tracer/base.py), implementing the five core abstract methods (`trace_context`, `create_span`, `operation_context`, `get_last_trace`, and optionally `get_langchain_handler`), and wiring the instance into `LitAgentRunner` to capture distributed traces from any proprietary observability framework.**

The `microsoft/agent-lightning` library provides a **backend-agnostic tracing API** that decouples instrumentation code from specific observability vendors. By implementing a custom tracer for proprietary frameworks in agent-lightning, you can route training metrics, agent steps, and span data to internal monitoring systems while maintaining full compatibility with the library's runners, adapters, and downstream analytics.

## Core Abstraction in agentlightning/tracer/base.py

The tracing contract lives in **[`agentlightning/tracer/base.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/tracer/base.py)**, where the abstract `Tracer` class defines the interface between the framework and your observability backend.

### Abstract Methods You Must Implement

Your custom implementation must override these five core methods:

- **`trace_context`** – An async context manager (decorated with **`@with_active_tracer_context`**) that starts a root span and yields a handle. This method calls **`set_active_tracer`** to make the tracer globally available during the `async with` block.
- **`_trace_context_sync`** – A synchronous counterpart used by legacy callers; implement as a standard `contextmanager`.
- **`create_span`** – Fire-and-forget creation of a single span that returns **`SpanCoreFields`** (name, attributes, timestamps, status).
- **`operation_context`** – A context manager for child spans that returns a **`SpanRecordingContext`** supporting `record_exception`, `record_attributes`, and `record_status`.
- **`get_last_trace`** – Returns a `List[Span]` containing all spans captured during the most recent trace.

### Active Tracer Management

The base class provides utilities to manage tracer visibility:

- **`set_active_tracer`** / **`clear_active_tracer`** – Make a tracer instance globally available during execution.
- **`@with_active_tracer_context`** – Defined at line 277 of [`base.py`](https://github.com/microsoft/agent-lightning/blob/main/base.py), this decorator automatically wraps your async context manager to handle activation and cleanup.

## Implementing Your Custom Tracer Class

### Subclassing Tracer

Begin by importing the abstract base and related types from `agentlightning.tracer.base` and `agentlightning.types`:

```python
from agentlightning.tracer.base import Tracer, with_active_tracer_context, set_active_tracer, clear_active_tracer
from agentlightning.types import Attributes, Span, SpanCoreFields, SpanRecordingContext, StatusCode, TraceStatus
from agentlightning.store.base import LightningStore

```

Create a class that inherits from `Tracer` and implement the required methods. You must also create a custom **`SpanRecordingContext`** subclass that wraps your proprietary SDK's native span objects.

### Initializing Per-Worker Resources

Override **`init_worker`** to receive the `worker_id` and optional `LightningStore`. This is where you instantiate your proprietary SDK's client:

```python
def init_worker(self, worker_id: int, store: Optional[LightningStore] = None) -> None:
    super().init_worker(worker_id, store)
    self._client = mytrace.Tracer(worker_id=worker_id)  # Proprietary SDK initialization

```

### Span Recording Context Implementation

Your context manager must return an object implementing the recording interface:

```python
class MySpanRecordingContext(SpanRecordingContext):
    def __init__(self, span): self._span = span
    
    def record_exception(self, exc: BaseException) -> None:
        self._span.record_exception(exc)
        
    def record_attributes(self, attrs: Attributes) -> None:
        self._span.set_attributes(attrs)
        
    def record_status(self, status_code: StatusCode, description: Optional[str] = None) -> None:
        self._span.set_status(status_code, description)

```

## Wiring the Tracer into LitAgentRunner

Once implemented, instantiate your tracer and pass it to the runner constructor. The runner automatically invokes `trace_context` during execution:

```python
from agentlightning.tracer.my_tracer import MyTracer
from agentlightning.runner import LitAgentRunner

tracer = MyTracer()
runner = LitAgentRunner(tracer=tracer, poll_interval=0.01)
await runner.run()

```

Because `@with_active_tracer_context` manages the active tracer state, any code inside the runner that calls **`get_active_tracer()`** (including adapters and emitters) will receive your custom instance.

## Complete Custom Tracer Code Example

Below is a production-ready implementation integrating a fictitious proprietary SDK `mytrace`. This mirrors the reference architecture used in `OtelTracer` and `AgentOpsTracer`:

```python

# agentlightning/tracer/my_tracer.py

from __future__ import annotations

import logging
import time
from contextlib import asynccontextmanager, contextmanager
from typing import AsyncGenerator, Iterator, List, Optional

import mytrace  # Replace with your proprietary SDK

from mytrace import Span as MySpan, Tracer as MyTracerHandle

from agentlightning.types import (
    Attributes, Span, SpanCoreFields, SpanRecordingContext, 
    StatusCode, TraceStatus
)
from agentlightning.store.base import LightningStore
from agentlightning.tracer.base import (
    Tracer, with_active_tracer_context, set_active_tracer, clear_active_tracer
)

log = logging.getLogger(__name__)


class MySpanRecordingContext(SpanRecordingContext):
    """Wraps the proprietary SDK span and forwards recording calls."""
    
    def __init__(self, span: MySpan) -> None:
        self._span = span

    def record_exception(self, exc: BaseException) -> None:
        self._span.record_exception(exc)
        self.record_status("ERROR", str(exc))

    def record_attributes(self, attrs: Attributes) -> None:
        self._span.set_attributes(attrs)

    def record_status(self, status_code: StatusCode, description: Optional[str] = None) -> None:
        self._span.set_status(status_code, description)


class MyTracer(Tracer):
    """Routes all spans to the proprietary mytrace SDK."""
    
    def __init__(self) -> None:
        super().__init__()
        self._client: Optional[MyTracerHandle] = None
        self._collected_spans: List[Span] = []

    def init_worker(self, worker_id: int, store: Optional[LightningStore] = None) -> None:
        """Initialize per-worker client."""
        super().init_worker(worker_id, store)
        self._client = mytrace.Tracer(worker_id=worker_id)
        log.info("[Worker %s] MyTracer initialized", worker_id)

    @with_active_tracer_context
    @asynccontextmanager
    async def trace_context(
        self,
        name: Optional[str] = None,
        *,
        store: Optional[LightningStore] = None,
        rollout_id: Optional[str] = None,
        attempt_id: Optional[str] = None,
    ) -> AsyncGenerator[MyTracerHandle, None]:
        """Start root span and activate tracer."""
        if not self._client:
            raise RuntimeError("Tracer not initialized; call init_worker first")
            
        root_span = self._client.start_span(name or "root")
        try:
            set_active_tracer(self)
            yield self._client
        finally:
            root_span.end()
            clear_active_tracer()

    @contextmanager
    def _trace_context_sync(
        self,
        name: Optional[str] = None,
        *,
        store: Optional[LightningStore] = None,
        rollout_id: Optional[str] = None,
        attempt_id: Optional[str] = None,
    ) -> Iterator[MyTracerHandle]:
        """Synchronous version for legacy callers."""
        if not self._client:
            raise RuntimeError("Tracer not initialized")
            
        root_span = self._client.start_span(name or "root")
        try:
            set_active_tracer(self)
            yield self._client
        finally:
            root_span.end()
            clear_active_tracer()

    def create_span(
        self,
        name: str,
        attributes: Optional[Attributes] = None,
        timestamp: Optional[float] = None,
        status: Optional[TraceStatus] = None,
    ) -> SpanCoreFields:
        """Fire-and-forget span creation."""
        if not self._client:
            raise RuntimeError("Tracer not initialized")
            
        start = int((timestamp or time.time()) * 1e9)
        span = self._client.start_span(name, attributes=attributes or {}, start_time=start)
        
        if status:
            span.set_status(status.status_code, status.description)
        span.end()
        
        core = SpanCoreFields(
            name=name,
            attributes=attributes or {},
            start_time=timestamp or time.time(),
            end_time=timestamp or time.time(),
            status=status or TraceStatus(status_code="OK"),
        )
        self._collected_spans.append(Span.from_core_fields(core))
        return core

    @contextmanager
    def operation_context(
        self,
        name: str,
        attributes: Optional[Attributes] = None,
        start_time: Optional[float] = None,
        end_time: Optional[float] = None,
    ) -> Iterator[MySpanRecordingContext]:
        """Create child span with recording capabilities."""
        if not self._client:
            raise RuntimeError("Tracer not initialized")
            
        start = int((start_time or time.time()) * 1e9)
        span = self._client.start_span(name, attributes=attributes or {}, start_time=start)
        ctx = MySpanRecordingContext(span)
        
        try:
            yield ctx
        except Exception as exc:
            ctx.record_exception(exc)
            raise
        finally:
            span.end(int((end_time or time.time()) * 1e9))

    def get_last_trace(self) -> List[Span]:
        """Return captured spans for the most recent trace."""
        return self._collected_spans

    def get_langchain_handler(self, tags: List[str] | None = None):
        """Optional: Return LangChain callback for proprietary SDK."""
        raise NotImplementedError("LangChain integration not implemented")

```

## Reference Implementations to Study

Study these concrete implementations in the repository to understand different integration patterns:

- **[`agentlightning/tracer/otel.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/tracer/otel.py)** – Full OpenTelemetry integration with OTLP export and `LightningSpanProcessor` for store submission.
- **[`agentlightning/tracer/dummy.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/tracer/dummy.py)** – Minimal no-op implementation useful as a testing template.
- **[`agentlightning/tracer/agentops.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/tracer/agentops.py)** – Third-party integration showing LangChain handler implementation and external client management.
- **[`agentlightning/tracer/weave.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/tracer/weave.py)** – Weave SDK integration demonstrating attribute mapping and span conversion.

## Production Deployment Checklist

Follow these steps to ensure your custom tracer is production-ready:

1. **Implement all abstract methods** – Ensure `trace_context`, `_trace_context_sync`, `create_span`, `operation_context`, and `get_last_trace` are fully functional.
2. **Initialize per-worker resources** – Create SDK clients in `init_worker` using the provided `worker_id` and `LightningStore`.
3. **Use the active tracer decorator** – Apply `@with_active_tracer_context` to `trace_context` to ensure proper global state management.
4. **Implement SpanRecordingContext** – Your `operation_context` must return an object that forwards `record_exception`, `record_attributes`, and `record_status` to your SDK.
5. **Buffer or export spans** – Store spans in `_collected_spans` (or push to a `LightningStore`) so `get_last_trace` returns data for downstream adapters like `TracerTraceToTriplet`.
6. **Add LangChain support** – Implement `get_langchain_handler` if your framework supports LangChain callbacks.
7. **Write unit tests** – Use the test fixtures in `tests/tracer/` to validate against the abstract contract using `DummyTracer` as a reference.

## Summary

- **Subclass `Tracer`** from [`agentlightning/tracer/base.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/tracer/base.py) to create a backend-agnostic integration.
- **Implement five core methods** including async/sync context managers and span recording contexts.
- **Use `@with_active_tracer_context`** to automatically manage global tracer state during execution.
- **Pass the instance to `LitAgentRunner`** to capture all training and inference traces.
- **Study [`otel.py`](https://github.com/microsoft/agent-lightning/blob/main/otel.py) and [`agentops.py`](https://github.com/microsoft/agent-lightning/blob/main/agentops.py)** for production patterns including store integration and LangChain support.

## Frequently Asked Questions

### Do I need to implement both async and sync trace_context methods?

Yes. You must implement **`trace_context`** (async) and **`_trace_context_sync`** (sync) because different callers in the codebase may use either pattern. The async version is the primary path used by `LitAgentRunner`, while the sync version supports legacy instrumentation. Both should use `set_active_tracer` and `clear_active_tracer` to manage global state.

### How do I make my tracer active during agent execution?

Decorate your async `trace_context` method with **`@with_active_tracer_context`**. This decorator (defined at line 277 of [`base.py`](https://github.com/microsoft/agent-lightning/blob/main/base.py)) automatically calls `set_active_tracer(self)` when entering the context and `clear_active_tracer()` when exiting. Any code calling `get_active_tracer()` inside the context block will receive your instance.

### Can I integrate LangChain callbacks with my custom tracer?

Yes. Implement the optional **`get_langchain_handler`** method to return a LangChain callback handler that forwards events to your proprietary SDK. See **[`agentlightning/tracer/agentops.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/tracer/agentops.py)** for a concrete example that wraps the AgentOps client in a LangChain-compatible callback, or return `None` if you do not need this integration.

### How should I handle span storage for get_last_trace?

You have two options: maintain an in-memory list (as shown in the `MyTracer` example above) or push spans to a **`LightningStore`** (see [`agentlightning/store/base.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/store/base.py)). The `OtelTracer` implementation demonstrates the store-based approach using `LightningSpanProcessor`, which is preferable for distributed scenarios where workers need to persist traces centrally.