Customizing the Training Loop in Agent-Lightning's Trainer: Architecture and Implementation Guide

TLDR: You can customize the Agent-Lightning training loop by passing custom ComponentSpec objects to the Trainer constructor, subclassing to override specific _make_* factory methods, or injecting Hook callbacks, enabling modifications to algorithms, runners, stores, and execution strategies without altering core orchestration code.

The Trainer class in the microsoft/agent-lightning repository serves as the high-level orchestrator that coordinates distributed reinforcement learning experiments. Unlike monolithic training frameworks, Agent-Lightning decomposes the training loop into discrete, lazy-initialized components resolved at runtime via build_component in agentlightning/trainer/init_utils.py. This architecture allows you to customize the training loop by swapping implementations or subclassing specific factory methods while preserving type safety and lifecycle management.

Understanding the Trainer Architecture

The Trainer orchestrates six primary component categories, each resolved through dedicated factory methods in agentlightning/trainer/trainer.py:

  • Algorithm – Encodes learning logic (e.g., Baseline or custom RL algorithms). Resolved via _make_algorithm() (see trainer.py lines 70-78) using build_component().
  • Runner – Executes LitAgent instances inside worker processes. Resolved via _make_runner() (lines 67-84) through instantiate_component(LitAgentRunner, ...).
  • Store – Persists rollouts, attempts, and spans. Resolved via _make_store() (lines 93-100) defaulting to InMemoryLightningStore.
  • ExecutionStrategy – Manages process lifecycles (shared-memory or client/server). Resolved via _make_strategy() (lines 10-30), defaulting to ClientServerExecutionStrategy when a port is specified.
  • Tracer / Adapter / LLMProxy – Handles telemetry conversion. Resolved via _make_tracer() (lines 52-66), _make_adapter(), and _make_llm_proxy() (lines 42-50, 78-86).
  • Hooks – Lifecycle callbacks (on_trace_start, on_rollout_end). Normalized via _normalize_hooks() (lines 86-93).

All components are lazy-initialized from ComponentSpec objects, which build_component (in init_utils.py lines 11-14, 73-84) can resolve from concrete instances, class types, callable factories, fully-qualified import strings, or dictionaries with a "type" key.

Customization Entry Points

Constructor-Level Component Injection

The simplest customization path involves passing alternative specifications directly to Trainer.__init__. Any of the following arguments accept flexible specs: algorithm, runner, store, strategy, tracer, adapter, llm_proxy, hooks, initial_resources, port, n_runners, and max_rollouts.

Subclassing for Method Overrides

For deeper control, subclass Trainer and override specific _make_* methods. This pattern allows you to intercept component instantiation while reusing the base orchestration logic for other dependencies.

Injecting Lifecycle Hooks

Supply a list of Hook instances to the hooks parameter. The _normalize_hooks() method (lines 86-93) validates and registers these callbacks, which the Trainer invokes at predefined moments such as on_rollout_start or on_rollout_end.

Swapping Execution Strategies

Replace the default ClientServerExecutionStrategy by passing a different ExecutionStrategy class or instance to the strategy parameter. Options include SharedMemoryExecutionStrategy or custom implementations adhering to the base class interface.

Practical Implementation Examples

Custom Runner and Hooks

This example demonstrates passing a custom runner class and a hook that logs rollout starts:

from agentlightning.trainer import Trainer
from agentlightning.runner import Runner
from agentlightning.tracer.base import Tracer
from agentlightning.types import Hook

class MyRunner(Runner):
    async def execute(self, task):
        # Custom execution logic

        return await super().execute(task)

class LogRolloutStart(Hook):
    async def on_rollout_start(self, rollout):
        print(f"🚀 rollout {rollout.id} started")

trainer = Trainer(
    runner=MyRunner,                # Resolved via _make_runner()

    hooks=[LogRolloutStart()],      # Normalized via _normalize_hooks()

    n_runners=4,
    max_rollouts=50,
)

The Trainer resolves runner through _make_runner() and validates hooks through _normalize_hooks() before the training loop begins.

Shared-Memory Execution Strategy

Replace the default client/server strategy with shared-memory communication for single-node deployments:

from agentlightning.execution.shared_memory import SharedMemoryExecutionStrategy
from agentlightning.trainer import Trainer

trainer = Trainer(
    strategy=SharedMemoryExecutionStrategy,   # Bypasses default ClientServerExecutionStrategy

    n_runners=2,
)

When port is not specified, _make_strategy() (lines 10-30) constructs the default strategy; passing a class reference directly instantiates your specified implementation instead.

Custom Tracer Implementation

Override telemetry collection by providing a custom tracer class:

from myproject.tracing import MyCustomTracer
from agentlightning.trainer import Trainer

trainer = Trainer(
    tracer=MyCustomTracer,          # Alternative: {"type": "myproject.tracing.MyCustomTracer"}

)

_make_tracer() (lines 52-66) falls back to AgentOpsTracer only when the tracer argument is None, allowing seamless injection of custom telemetry adapters.

Advanced Subclassing Pattern

For persistent storage requirements, override _make_store() while keeping other factory methods intact:

from agentlightning.trainer import Trainer
from agentlightning.execution.base import ExecutionStrategy

class MyTrainer(Trainer):
    def _make_store(self, store, strategy: ExecutionStrategy):
        from agentlightning.store.redis import RedisLightningStore
        return RedisLightningStore(
            url="redis://localhost:6379", 
            thread_safe=False
        )

trainer = MyTrainer(algorithm="myproject.algos.MyAlgo")

This subclass intercepts store creation (normally defaulting to InMemoryLightningStore at lines 93-100) while delegating algorithm, runner, and strategy resolution to the parent class.

Key Source Files

Critical implementation details reside in the following files:

Summary

  • The Trainer uses lazy-initialized components resolved via build_component() in init_utils.py, supporting classes, instances, factories, or config dictionaries.
  • Customize the training loop by passing specs to the constructor, subclassing to override _make_* methods, injecting Hook callbacks, or swapping ExecutionStrategy implementations.
  • Specific factory methods like _make_store() (lines 93-100) and _make_runner() (lines 67-84) provide surgical override points without forked code.
  • All customizations maintain type safety through build_component validation (lines 11-14, 73-84).

Frequently Asked Questions

How do I inject a custom algorithm into the Trainer?

Pass your algorithm class, instance, or import string to the algorithm parameter. The Trainer resolves it via _make_algorithm() using build_component(), which validates the spec and instantiates the component with optional default arguments like store or tracer if the constructor accepts them.

Can I use multiple execution strategies in the same training run?

No, the Trainer accepts a single strategy specification that governs the entire training loop's process management. However, you can implement a composite ExecutionStrategy subclass that internally delegates to different backends based on worker configuration or environment detection.

What is the difference between overriding _make_runner() and passing a custom runner spec?

Passing a custom spec to the runner parameter relies on the default _make_runner() implementation, which handles standard instantiation and dependency injection (e.g., automatically passing tracer and max_rollouts). Overriding _make_runner() in a subclass allows you to completely bypass this logic, manipulate constructor arguments manually, or implement conditional instantiation logic unavailable through the declarative spec system.

How do hooks differ from subclassing the Trainer?

Hooks provide non-invasive lifecycle callbacks executed at specific points (trace start, rollout end) without altering the Trainer's internal state or flow. Subclassing allows you to modify the behavior of component instantiation and training loop orchestration itself, making hooks ideal for logging and metrics while subclassing suits architectural changes like custom store implementations.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →