Customizing the Training Loop in Agent-Lightning's Trainer: Architecture and Implementation Guide
TLDR: You can customize the Agent-Lightning training loop by passing custom ComponentSpec objects to the Trainer constructor, subclassing to override specific _make_* factory methods, or injecting Hook callbacks, enabling modifications to algorithms, runners, stores, and execution strategies without altering core orchestration code.
The Trainer class in the microsoft/agent-lightning repository serves as the high-level orchestrator that coordinates distributed reinforcement learning experiments. Unlike monolithic training frameworks, Agent-Lightning decomposes the training loop into discrete, lazy-initialized components resolved at runtime via build_component in agentlightning/trainer/init_utils.py. This architecture allows you to customize the training loop by swapping implementations or subclassing specific factory methods while preserving type safety and lifecycle management.
Understanding the Trainer Architecture
The Trainer orchestrates six primary component categories, each resolved through dedicated factory methods in agentlightning/trainer/trainer.py:
- Algorithm – Encodes learning logic (e.g.,
Baselineor custom RL algorithms). Resolved via_make_algorithm()(seetrainer.pylines 70-78) usingbuild_component(). - Runner – Executes
LitAgentinstances inside worker processes. Resolved via_make_runner()(lines 67-84) throughinstantiate_component(LitAgentRunner, ...). - Store – Persists rollouts, attempts, and spans. Resolved via
_make_store()(lines 93-100) defaulting toInMemoryLightningStore. - ExecutionStrategy – Manages process lifecycles (shared-memory or client/server). Resolved via
_make_strategy()(lines 10-30), defaulting toClientServerExecutionStrategywhen a port is specified. - Tracer / Adapter / LLMProxy – Handles telemetry conversion. Resolved via
_make_tracer()(lines 52-66),_make_adapter(), and_make_llm_proxy()(lines 42-50, 78-86). - Hooks – Lifecycle callbacks (
on_trace_start,on_rollout_end). Normalized via_normalize_hooks()(lines 86-93).
All components are lazy-initialized from ComponentSpec objects, which build_component (in init_utils.py lines 11-14, 73-84) can resolve from concrete instances, class types, callable factories, fully-qualified import strings, or dictionaries with a "type" key.
Customization Entry Points
Constructor-Level Component Injection
The simplest customization path involves passing alternative specifications directly to Trainer.__init__. Any of the following arguments accept flexible specs: algorithm, runner, store, strategy, tracer, adapter, llm_proxy, hooks, initial_resources, port, n_runners, and max_rollouts.
Subclassing for Method Overrides
For deeper control, subclass Trainer and override specific _make_* methods. This pattern allows you to intercept component instantiation while reusing the base orchestration logic for other dependencies.
Injecting Lifecycle Hooks
Supply a list of Hook instances to the hooks parameter. The _normalize_hooks() method (lines 86-93) validates and registers these callbacks, which the Trainer invokes at predefined moments such as on_rollout_start or on_rollout_end.
Swapping Execution Strategies
Replace the default ClientServerExecutionStrategy by passing a different ExecutionStrategy class or instance to the strategy parameter. Options include SharedMemoryExecutionStrategy or custom implementations adhering to the base class interface.
Practical Implementation Examples
Custom Runner and Hooks
This example demonstrates passing a custom runner class and a hook that logs rollout starts:
from agentlightning.trainer import Trainer
from agentlightning.runner import Runner
from agentlightning.tracer.base import Tracer
from agentlightning.types import Hook
class MyRunner(Runner):
async def execute(self, task):
# Custom execution logic
return await super().execute(task)
class LogRolloutStart(Hook):
async def on_rollout_start(self, rollout):
print(f"🚀 rollout {rollout.id} started")
trainer = Trainer(
runner=MyRunner, # Resolved via _make_runner()
hooks=[LogRolloutStart()], # Normalized via _normalize_hooks()
n_runners=4,
max_rollouts=50,
)
The Trainer resolves runner through _make_runner() and validates hooks through _normalize_hooks() before the training loop begins.
Shared-Memory Execution Strategy
Replace the default client/server strategy with shared-memory communication for single-node deployments:
from agentlightning.execution.shared_memory import SharedMemoryExecutionStrategy
from agentlightning.trainer import Trainer
trainer = Trainer(
strategy=SharedMemoryExecutionStrategy, # Bypasses default ClientServerExecutionStrategy
n_runners=2,
)
When port is not specified, _make_strategy() (lines 10-30) constructs the default strategy; passing a class reference directly instantiates your specified implementation instead.
Custom Tracer Implementation
Override telemetry collection by providing a custom tracer class:
from myproject.tracing import MyCustomTracer
from agentlightning.trainer import Trainer
trainer = Trainer(
tracer=MyCustomTracer, # Alternative: {"type": "myproject.tracing.MyCustomTracer"}
)
_make_tracer() (lines 52-66) falls back to AgentOpsTracer only when the tracer argument is None, allowing seamless injection of custom telemetry adapters.
Advanced Subclassing Pattern
For persistent storage requirements, override _make_store() while keeping other factory methods intact:
from agentlightning.trainer import Trainer
from agentlightning.execution.base import ExecutionStrategy
class MyTrainer(Trainer):
def _make_store(self, store, strategy: ExecutionStrategy):
from agentlightning.store.redis import RedisLightningStore
return RedisLightningStore(
url="redis://localhost:6379",
thread_safe=False
)
trainer = MyTrainer(algorithm="myproject.algos.MyAlgo")
This subclass intercepts store creation (normally defaulting to InMemoryLightningStore at lines 93-100) while delegating algorithm, runner, and strategy resolution to the parent class.
Key Source Files
Critical implementation details reside in the following files:
agentlightning/trainer/trainer.py– CoreTrainerclass with_make_*factory methods andfit/deventry points (lines 10-100).agentlightning/trainer/init_utils.py–build_componentandinstantiate_componentutilities for flexible spec resolution (lines 11-14, 73-84).agentlightning/execution/shared_memory.py– Shared-memory strategy for local parallelism.agentlightning/execution/client_server.py– Distributed client/server execution strategy.agentlightning/store/memory.py– DefaultInMemoryLightningStoreimplementation.
Summary
- The
Traineruses lazy-initialized components resolved viabuild_component()ininit_utils.py, supporting classes, instances, factories, or config dictionaries. - Customize the training loop by passing specs to the constructor, subclassing to override
_make_*methods, injectingHookcallbacks, or swappingExecutionStrategyimplementations. - Specific factory methods like
_make_store()(lines 93-100) and_make_runner()(lines 67-84) provide surgical override points without forked code. - All customizations maintain type safety through
build_componentvalidation (lines 11-14, 73-84).
Frequently Asked Questions
How do I inject a custom algorithm into the Trainer?
Pass your algorithm class, instance, or import string to the algorithm parameter. The Trainer resolves it via _make_algorithm() using build_component(), which validates the spec and instantiates the component with optional default arguments like store or tracer if the constructor accepts them.
Can I use multiple execution strategies in the same training run?
No, the Trainer accepts a single strategy specification that governs the entire training loop's process management. However, you can implement a composite ExecutionStrategy subclass that internally delegates to different backends based on worker configuration or environment detection.
What is the difference between overriding _make_runner() and passing a custom runner spec?
Passing a custom spec to the runner parameter relies on the default _make_runner() implementation, which handles standard instantiation and dependency injection (e.g., automatically passing tracer and max_rollouts). Overriding _make_runner() in a subclass allows you to completely bypass this logic, manipulate constructor arguments manually, or implement conditional instantiation logic unavailable through the declarative spec system.
How do hooks differ from subclassing the Trainer?
Hooks provide non-invasive lifecycle callbacks executed at specific points (trace start, rollout end) without altering the Trainer's internal state or flow. Subclassing allows you to modify the behavior of component instantiation and training loop orchestration itself, making hooks ideal for logging and metrics while subclassing suits architectural changes like custom store implementations.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →