How to Implement a Custom Tracer for Proprietary Frameworks in agent-lightning

Implement a custom tracer by subclassing the abstract Tracer class defined in agentlightning/tracer/base.py, implementing the five core abstract methods (trace_context, create_span, operation_context, get_last_trace, and optionally get_langchain_handler), and wiring the instance into LitAgentRunner to capture distributed traces from any proprietary observability framework.

The microsoft/agent-lightning library provides a backend-agnostic tracing API that decouples instrumentation code from specific observability vendors. By implementing a custom tracer for proprietary frameworks in agent-lightning, you can route training metrics, agent steps, and span data to internal monitoring systems while maintaining full compatibility with the library's runners, adapters, and downstream analytics.

Core Abstraction in agentlightning/tracer/base.py

The tracing contract lives in agentlightning/tracer/base.py, where the abstract Tracer class defines the interface between the framework and your observability backend.

Abstract Methods You Must Implement

Your custom implementation must override these five core methods:

  • trace_context – An async context manager (decorated with @with_active_tracer_context) that starts a root span and yields a handle. This method calls set_active_tracer to make the tracer globally available during the async with block.
  • _trace_context_sync – A synchronous counterpart used by legacy callers; implement as a standard contextmanager.
  • create_span – Fire-and-forget creation of a single span that returns SpanCoreFields (name, attributes, timestamps, status).
  • operation_context – A context manager for child spans that returns a SpanRecordingContext supporting record_exception, record_attributes, and record_status.
  • get_last_trace – Returns a List[Span] containing all spans captured during the most recent trace.

Active Tracer Management

The base class provides utilities to manage tracer visibility:

  • set_active_tracer / clear_active_tracer – Make a tracer instance globally available during execution.
  • @with_active_tracer_context – Defined at line 277 of base.py, this decorator automatically wraps your async context manager to handle activation and cleanup.

Implementing Your Custom Tracer Class

Subclassing Tracer

Begin by importing the abstract base and related types from agentlightning.tracer.base and agentlightning.types:

from agentlightning.tracer.base import Tracer, with_active_tracer_context, set_active_tracer, clear_active_tracer
from agentlightning.types import Attributes, Span, SpanCoreFields, SpanRecordingContext, StatusCode, TraceStatus
from agentlightning.store.base import LightningStore

Create a class that inherits from Tracer and implement the required methods. You must also create a custom SpanRecordingContext subclass that wraps your proprietary SDK's native span objects.

Initializing Per-Worker Resources

Override init_worker to receive the worker_id and optional LightningStore. This is where you instantiate your proprietary SDK's client:

def init_worker(self, worker_id: int, store: Optional[LightningStore] = None) -> None:
    super().init_worker(worker_id, store)
    self._client = mytrace.Tracer(worker_id=worker_id)  # Proprietary SDK initialization

Span Recording Context Implementation

Your context manager must return an object implementing the recording interface:

class MySpanRecordingContext(SpanRecordingContext):
    def __init__(self, span): self._span = span
    
    def record_exception(self, exc: BaseException) -> None:
        self._span.record_exception(exc)
        
    def record_attributes(self, attrs: Attributes) -> None:
        self._span.set_attributes(attrs)
        
    def record_status(self, status_code: StatusCode, description: Optional[str] = None) -> None:
        self._span.set_status(status_code, description)

Wiring the Tracer into LitAgentRunner

Once implemented, instantiate your tracer and pass it to the runner constructor. The runner automatically invokes trace_context during execution:

from agentlightning.tracer.my_tracer import MyTracer
from agentlightning.runner import LitAgentRunner

tracer = MyTracer()
runner = LitAgentRunner(tracer=tracer, poll_interval=0.01)
await runner.run()

Because @with_active_tracer_context manages the active tracer state, any code inside the runner that calls get_active_tracer() (including adapters and emitters) will receive your custom instance.

Complete Custom Tracer Code Example

Below is a production-ready implementation integrating a fictitious proprietary SDK mytrace. This mirrors the reference architecture used in OtelTracer and AgentOpsTracer:


# agentlightning/tracer/my_tracer.py

from __future__ import annotations

import logging
import time
from contextlib import asynccontextmanager, contextmanager
from typing import AsyncGenerator, Iterator, List, Optional

import mytrace  # Replace with your proprietary SDK

from mytrace import Span as MySpan, Tracer as MyTracerHandle

from agentlightning.types import (
    Attributes, Span, SpanCoreFields, SpanRecordingContext, 
    StatusCode, TraceStatus
)
from agentlightning.store.base import LightningStore
from agentlightning.tracer.base import (
    Tracer, with_active_tracer_context, set_active_tracer, clear_active_tracer
)

log = logging.getLogger(__name__)


class MySpanRecordingContext(SpanRecordingContext):
    """Wraps the proprietary SDK span and forwards recording calls."""
    
    def __init__(self, span: MySpan) -> None:
        self._span = span

    def record_exception(self, exc: BaseException) -> None:
        self._span.record_exception(exc)
        self.record_status("ERROR", str(exc))

    def record_attributes(self, attrs: Attributes) -> None:
        self._span.set_attributes(attrs)

    def record_status(self, status_code: StatusCode, description: Optional[str] = None) -> None:
        self._span.set_status(status_code, description)


class MyTracer(Tracer):
    """Routes all spans to the proprietary mytrace SDK."""
    
    def __init__(self) -> None:
        super().__init__()
        self._client: Optional[MyTracerHandle] = None
        self._collected_spans: List[Span] = []

    def init_worker(self, worker_id: int, store: Optional[LightningStore] = None) -> None:
        """Initialize per-worker client."""
        super().init_worker(worker_id, store)
        self._client = mytrace.Tracer(worker_id=worker_id)
        log.info("[Worker %s] MyTracer initialized", worker_id)

    @with_active_tracer_context
    @asynccontextmanager
    async def trace_context(
        self,
        name: Optional[str] = None,
        *,
        store: Optional[LightningStore] = None,
        rollout_id: Optional[str] = None,
        attempt_id: Optional[str] = None,
    ) -> AsyncGenerator[MyTracerHandle, None]:
        """Start root span and activate tracer."""
        if not self._client:
            raise RuntimeError("Tracer not initialized; call init_worker first")
            
        root_span = self._client.start_span(name or "root")
        try:
            set_active_tracer(self)
            yield self._client
        finally:
            root_span.end()
            clear_active_tracer()

    @contextmanager
    def _trace_context_sync(
        self,
        name: Optional[str] = None,
        *,
        store: Optional[LightningStore] = None,
        rollout_id: Optional[str] = None,
        attempt_id: Optional[str] = None,
    ) -> Iterator[MyTracerHandle]:
        """Synchronous version for legacy callers."""
        if not self._client:
            raise RuntimeError("Tracer not initialized")
            
        root_span = self._client.start_span(name or "root")
        try:
            set_active_tracer(self)
            yield self._client
        finally:
            root_span.end()
            clear_active_tracer()

    def create_span(
        self,
        name: str,
        attributes: Optional[Attributes] = None,
        timestamp: Optional[float] = None,
        status: Optional[TraceStatus] = None,
    ) -> SpanCoreFields:
        """Fire-and-forget span creation."""
        if not self._client:
            raise RuntimeError("Tracer not initialized")
            
        start = int((timestamp or time.time()) * 1e9)
        span = self._client.start_span(name, attributes=attributes or {}, start_time=start)
        
        if status:
            span.set_status(status.status_code, status.description)
        span.end()
        
        core = SpanCoreFields(
            name=name,
            attributes=attributes or {},
            start_time=timestamp or time.time(),
            end_time=timestamp or time.time(),
            status=status or TraceStatus(status_code="OK"),
        )
        self._collected_spans.append(Span.from_core_fields(core))
        return core

    @contextmanager
    def operation_context(
        self,
        name: str,
        attributes: Optional[Attributes] = None,
        start_time: Optional[float] = None,
        end_time: Optional[float] = None,
    ) -> Iterator[MySpanRecordingContext]:
        """Create child span with recording capabilities."""
        if not self._client:
            raise RuntimeError("Tracer not initialized")
            
        start = int((start_time or time.time()) * 1e9)
        span = self._client.start_span(name, attributes=attributes or {}, start_time=start)
        ctx = MySpanRecordingContext(span)
        
        try:
            yield ctx
        except Exception as exc:
            ctx.record_exception(exc)
            raise
        finally:
            span.end(int((end_time or time.time()) * 1e9))

    def get_last_trace(self) -> List[Span]:
        """Return captured spans for the most recent trace."""
        return self._collected_spans

    def get_langchain_handler(self, tags: List[str] | None = None):
        """Optional: Return LangChain callback for proprietary SDK."""
        raise NotImplementedError("LangChain integration not implemented")

Reference Implementations to Study

Study these concrete implementations in the repository to understand different integration patterns:

Production Deployment Checklist

Follow these steps to ensure your custom tracer is production-ready:

  1. Implement all abstract methods – Ensure trace_context, _trace_context_sync, create_span, operation_context, and get_last_trace are fully functional.
  2. Initialize per-worker resources – Create SDK clients in init_worker using the provided worker_id and LightningStore.
  3. Use the active tracer decorator – Apply @with_active_tracer_context to trace_context to ensure proper global state management.
  4. Implement SpanRecordingContext – Your operation_context must return an object that forwards record_exception, record_attributes, and record_status to your SDK.
  5. Buffer or export spans – Store spans in _collected_spans (or push to a LightningStore) so get_last_trace returns data for downstream adapters like TracerTraceToTriplet.
  6. Add LangChain support – Implement get_langchain_handler if your framework supports LangChain callbacks.
  7. Write unit tests – Use the test fixtures in tests/tracer/ to validate against the abstract contract using DummyTracer as a reference.

Summary

  • Subclass Tracer from agentlightning/tracer/base.py to create a backend-agnostic integration.
  • Implement five core methods including async/sync context managers and span recording contexts.
  • Use @with_active_tracer_context to automatically manage global tracer state during execution.
  • Pass the instance to LitAgentRunner to capture all training and inference traces.
  • Study otel.py and agentops.py for production patterns including store integration and LangChain support.

Frequently Asked Questions

Do I need to implement both async and sync trace_context methods?

Yes. You must implement trace_context (async) and _trace_context_sync (sync) because different callers in the codebase may use either pattern. The async version is the primary path used by LitAgentRunner, while the sync version supports legacy instrumentation. Both should use set_active_tracer and clear_active_tracer to manage global state.

How do I make my tracer active during agent execution?

Decorate your async trace_context method with @with_active_tracer_context. This decorator (defined at line 277 of base.py) automatically calls set_active_tracer(self) when entering the context and clear_active_tracer() when exiting. Any code calling get_active_tracer() inside the context block will receive your instance.

Can I integrate LangChain callbacks with my custom tracer?

Yes. Implement the optional get_langchain_handler method to return a LangChain callback handler that forwards events to your proprietary SDK. See agentlightning/tracer/agentops.py for a concrete example that wraps the AgentOps client in a LangChain-compatible callback, or return None if you do not need this integration.

How should I handle span storage for get_last_trace?

You have two options: maintain an in-memory list (as shown in the MyTracer example above) or push spans to a LightningStore (see agentlightning/store/base.py). The OtelTracer implementation demonstrates the store-based approach using LightningSpanProcessor, which is preferable for distributed scenarios where workers need to persist traces centrally.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →