How to Implement a Custom Tracer for Proprietary Frameworks in agent-lightning
Implement a custom tracer by subclassing the abstract Tracer class defined in agentlightning/tracer/base.py, implementing the five core abstract methods (trace_context, create_span, operation_context, get_last_trace, and optionally get_langchain_handler), and wiring the instance into LitAgentRunner to capture distributed traces from any proprietary observability framework.
The microsoft/agent-lightning library provides a backend-agnostic tracing API that decouples instrumentation code from specific observability vendors. By implementing a custom tracer for proprietary frameworks in agent-lightning, you can route training metrics, agent steps, and span data to internal monitoring systems while maintaining full compatibility with the library's runners, adapters, and downstream analytics.
Core Abstraction in agentlightning/tracer/base.py
The tracing contract lives in agentlightning/tracer/base.py, where the abstract Tracer class defines the interface between the framework and your observability backend.
Abstract Methods You Must Implement
Your custom implementation must override these five core methods:
trace_context– An async context manager (decorated with@with_active_tracer_context) that starts a root span and yields a handle. This method callsset_active_tracerto make the tracer globally available during theasync withblock._trace_context_sync– A synchronous counterpart used by legacy callers; implement as a standardcontextmanager.create_span– Fire-and-forget creation of a single span that returnsSpanCoreFields(name, attributes, timestamps, status).operation_context– A context manager for child spans that returns aSpanRecordingContextsupportingrecord_exception,record_attributes, andrecord_status.get_last_trace– Returns aList[Span]containing all spans captured during the most recent trace.
Active Tracer Management
The base class provides utilities to manage tracer visibility:
set_active_tracer/clear_active_tracer– Make a tracer instance globally available during execution.@with_active_tracer_context– Defined at line 277 ofbase.py, this decorator automatically wraps your async context manager to handle activation and cleanup.
Implementing Your Custom Tracer Class
Subclassing Tracer
Begin by importing the abstract base and related types from agentlightning.tracer.base and agentlightning.types:
from agentlightning.tracer.base import Tracer, with_active_tracer_context, set_active_tracer, clear_active_tracer
from agentlightning.types import Attributes, Span, SpanCoreFields, SpanRecordingContext, StatusCode, TraceStatus
from agentlightning.store.base import LightningStore
Create a class that inherits from Tracer and implement the required methods. You must also create a custom SpanRecordingContext subclass that wraps your proprietary SDK's native span objects.
Initializing Per-Worker Resources
Override init_worker to receive the worker_id and optional LightningStore. This is where you instantiate your proprietary SDK's client:
def init_worker(self, worker_id: int, store: Optional[LightningStore] = None) -> None:
super().init_worker(worker_id, store)
self._client = mytrace.Tracer(worker_id=worker_id) # Proprietary SDK initialization
Span Recording Context Implementation
Your context manager must return an object implementing the recording interface:
class MySpanRecordingContext(SpanRecordingContext):
def __init__(self, span): self._span = span
def record_exception(self, exc: BaseException) -> None:
self._span.record_exception(exc)
def record_attributes(self, attrs: Attributes) -> None:
self._span.set_attributes(attrs)
def record_status(self, status_code: StatusCode, description: Optional[str] = None) -> None:
self._span.set_status(status_code, description)
Wiring the Tracer into LitAgentRunner
Once implemented, instantiate your tracer and pass it to the runner constructor. The runner automatically invokes trace_context during execution:
from agentlightning.tracer.my_tracer import MyTracer
from agentlightning.runner import LitAgentRunner
tracer = MyTracer()
runner = LitAgentRunner(tracer=tracer, poll_interval=0.01)
await runner.run()
Because @with_active_tracer_context manages the active tracer state, any code inside the runner that calls get_active_tracer() (including adapters and emitters) will receive your custom instance.
Complete Custom Tracer Code Example
Below is a production-ready implementation integrating a fictitious proprietary SDK mytrace. This mirrors the reference architecture used in OtelTracer and AgentOpsTracer:
# agentlightning/tracer/my_tracer.py
from __future__ import annotations
import logging
import time
from contextlib import asynccontextmanager, contextmanager
from typing import AsyncGenerator, Iterator, List, Optional
import mytrace # Replace with your proprietary SDK
from mytrace import Span as MySpan, Tracer as MyTracerHandle
from agentlightning.types import (
Attributes, Span, SpanCoreFields, SpanRecordingContext,
StatusCode, TraceStatus
)
from agentlightning.store.base import LightningStore
from agentlightning.tracer.base import (
Tracer, with_active_tracer_context, set_active_tracer, clear_active_tracer
)
log = logging.getLogger(__name__)
class MySpanRecordingContext(SpanRecordingContext):
"""Wraps the proprietary SDK span and forwards recording calls."""
def __init__(self, span: MySpan) -> None:
self._span = span
def record_exception(self, exc: BaseException) -> None:
self._span.record_exception(exc)
self.record_status("ERROR", str(exc))
def record_attributes(self, attrs: Attributes) -> None:
self._span.set_attributes(attrs)
def record_status(self, status_code: StatusCode, description: Optional[str] = None) -> None:
self._span.set_status(status_code, description)
class MyTracer(Tracer):
"""Routes all spans to the proprietary mytrace SDK."""
def __init__(self) -> None:
super().__init__()
self._client: Optional[MyTracerHandle] = None
self._collected_spans: List[Span] = []
def init_worker(self, worker_id: int, store: Optional[LightningStore] = None) -> None:
"""Initialize per-worker client."""
super().init_worker(worker_id, store)
self._client = mytrace.Tracer(worker_id=worker_id)
log.info("[Worker %s] MyTracer initialized", worker_id)
@with_active_tracer_context
@asynccontextmanager
async def trace_context(
self,
name: Optional[str] = None,
*,
store: Optional[LightningStore] = None,
rollout_id: Optional[str] = None,
attempt_id: Optional[str] = None,
) -> AsyncGenerator[MyTracerHandle, None]:
"""Start root span and activate tracer."""
if not self._client:
raise RuntimeError("Tracer not initialized; call init_worker first")
root_span = self._client.start_span(name or "root")
try:
set_active_tracer(self)
yield self._client
finally:
root_span.end()
clear_active_tracer()
@contextmanager
def _trace_context_sync(
self,
name: Optional[str] = None,
*,
store: Optional[LightningStore] = None,
rollout_id: Optional[str] = None,
attempt_id: Optional[str] = None,
) -> Iterator[MyTracerHandle]:
"""Synchronous version for legacy callers."""
if not self._client:
raise RuntimeError("Tracer not initialized")
root_span = self._client.start_span(name or "root")
try:
set_active_tracer(self)
yield self._client
finally:
root_span.end()
clear_active_tracer()
def create_span(
self,
name: str,
attributes: Optional[Attributes] = None,
timestamp: Optional[float] = None,
status: Optional[TraceStatus] = None,
) -> SpanCoreFields:
"""Fire-and-forget span creation."""
if not self._client:
raise RuntimeError("Tracer not initialized")
start = int((timestamp or time.time()) * 1e9)
span = self._client.start_span(name, attributes=attributes or {}, start_time=start)
if status:
span.set_status(status.status_code, status.description)
span.end()
core = SpanCoreFields(
name=name,
attributes=attributes or {},
start_time=timestamp or time.time(),
end_time=timestamp or time.time(),
status=status or TraceStatus(status_code="OK"),
)
self._collected_spans.append(Span.from_core_fields(core))
return core
@contextmanager
def operation_context(
self,
name: str,
attributes: Optional[Attributes] = None,
start_time: Optional[float] = None,
end_time: Optional[float] = None,
) -> Iterator[MySpanRecordingContext]:
"""Create child span with recording capabilities."""
if not self._client:
raise RuntimeError("Tracer not initialized")
start = int((start_time or time.time()) * 1e9)
span = self._client.start_span(name, attributes=attributes or {}, start_time=start)
ctx = MySpanRecordingContext(span)
try:
yield ctx
except Exception as exc:
ctx.record_exception(exc)
raise
finally:
span.end(int((end_time or time.time()) * 1e9))
def get_last_trace(self) -> List[Span]:
"""Return captured spans for the most recent trace."""
return self._collected_spans
def get_langchain_handler(self, tags: List[str] | None = None):
"""Optional: Return LangChain callback for proprietary SDK."""
raise NotImplementedError("LangChain integration not implemented")
Reference Implementations to Study
Study these concrete implementations in the repository to understand different integration patterns:
agentlightning/tracer/otel.py– Full OpenTelemetry integration with OTLP export andLightningSpanProcessorfor store submission.agentlightning/tracer/dummy.py– Minimal no-op implementation useful as a testing template.agentlightning/tracer/agentops.py– Third-party integration showing LangChain handler implementation and external client management.agentlightning/tracer/weave.py– Weave SDK integration demonstrating attribute mapping and span conversion.
Production Deployment Checklist
Follow these steps to ensure your custom tracer is production-ready:
- Implement all abstract methods – Ensure
trace_context,_trace_context_sync,create_span,operation_context, andget_last_traceare fully functional. - Initialize per-worker resources – Create SDK clients in
init_workerusing the providedworker_idandLightningStore. - Use the active tracer decorator – Apply
@with_active_tracer_contexttotrace_contextto ensure proper global state management. - Implement SpanRecordingContext – Your
operation_contextmust return an object that forwardsrecord_exception,record_attributes, andrecord_statusto your SDK. - Buffer or export spans – Store spans in
_collected_spans(or push to aLightningStore) soget_last_tracereturns data for downstream adapters likeTracerTraceToTriplet. - Add LangChain support – Implement
get_langchain_handlerif your framework supports LangChain callbacks. - Write unit tests – Use the test fixtures in
tests/tracer/to validate against the abstract contract usingDummyTraceras a reference.
Summary
- Subclass
Tracerfromagentlightning/tracer/base.pyto create a backend-agnostic integration. - Implement five core methods including async/sync context managers and span recording contexts.
- Use
@with_active_tracer_contextto automatically manage global tracer state during execution. - Pass the instance to
LitAgentRunnerto capture all training and inference traces. - Study
otel.pyandagentops.pyfor production patterns including store integration and LangChain support.
Frequently Asked Questions
Do I need to implement both async and sync trace_context methods?
Yes. You must implement trace_context (async) and _trace_context_sync (sync) because different callers in the codebase may use either pattern. The async version is the primary path used by LitAgentRunner, while the sync version supports legacy instrumentation. Both should use set_active_tracer and clear_active_tracer to manage global state.
How do I make my tracer active during agent execution?
Decorate your async trace_context method with @with_active_tracer_context. This decorator (defined at line 277 of base.py) automatically calls set_active_tracer(self) when entering the context and clear_active_tracer() when exiting. Any code calling get_active_tracer() inside the context block will receive your instance.
Can I integrate LangChain callbacks with my custom tracer?
Yes. Implement the optional get_langchain_handler method to return a LangChain callback handler that forwards events to your proprietary SDK. See agentlightning/tracer/agentops.py for a concrete example that wraps the AgentOps client in a LangChain-compatible callback, or return None if you do not need this integration.
How should I handle span storage for get_last_trace?
You have two options: maintain an in-memory list (as shown in the MyTracer example above) or push spans to a LightningStore (see agentlightning/store/base.py). The OtelTracer implementation demonstrates the store-based approach using LightningSpanProcessor, which is preferable for distributed scenarios where workers need to persist traces centrally.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →