testing-strategies

Testing Strategies for Agent Training Pipelines in Agent-Lightning: A Complete Guide

April 1, 2026 microsoft/agent-lightning ↗

Agent-Lightning training pipelines should be tested at three levels: component wiring in Trainer.__init__, strategy selection logic in _make_strategy, and runtime contracts enforced by Trainer.dev and Trainer.fit.

The microsoft/agent-lightning repository provides a modular framework for orchestrating agent training through the Trainer class, which coordinates Algorithm, Runner, Store, and Telemetry components via configurable ExecutionStrategy implementations. Implementing robust testing strategies for agent training pipelines in agent-lightning requires verifying how these components resolve from user specifications, how execution strategies handle configuration parameters like ports and runner counts, and how runtime methods enforce type constraints. The test suite in tests/trainer/ demonstrates patterns for validating each layer of this orchestration without requiring full end-to-end training runs.

Understanding the Trainer Architecture

The Trainer class in agentlightning/trainer/trainer.py serves as the central coordination point that builds and wires together the training pipeline components. It delegates component instantiation to helper methods (_make_tracer, _make_algorithm, _make_adapter, _make_store, _make_strategy, _make_runner) defined in agentlightning/trainer/init_utils.py and ultimately hands control to an ExecutionStrategy such as SharedMemoryExecutionStrategy or ClientServerExecutionStrategy. Understanding this architecture is essential for designing effective testing strategies for agent training pipelines in agent-lightning that target specific integration points without requiring heavy integration tests.

Testing Component Wiring and Resolution

The Trainer.__init__ method accepts flexible specifications for each component, including concrete instances, class objects, factory callables, string registry identifiers, or configuration dictionaries. Your tests should verify that _make_* methods correctly interpret each specification type and fall back to sensible defaults when arguments are None.

Validating Specification Resolution

When testing component resolution, instantiate Trainer with mixed specification types and assert that resulting attributes match expected instances. For example, passing tracer=agl.OtelTracer() should result in trainer.tracer being an OtelTracer instance, while omitting the algorithm should default to Baseline. Additionally, verify that max_rollouts parameters are correctly forwarded to the runner when specified.

Testing Default Fallback Behavior

Each _make_* method defines default factories that activate when specifications are omitted. Verify that None values trigger the correct defaults: AgentOpsTracer for tracing, Baseline for algorithms, and InMemoryLightningStore for storage. These assertions ensure that partial configurations remain functional and that the trainer maintains backward compatibility.

Validating Strategy Selection and Configuration

The strategy parameter accepts string aliases ("shm" for shared memory, "cs" for client/server) or dictionaries with explicit configuration. Tests must confirm that Trainer._make_strategy returns the correct ExecutionStrategy subclass and propagates parameters like n_runners, port, and managed_store.

Testing String Aliases and Dictionary Specifications

When passing strategy="shm", assert that trainer.strategy is an instance of SharedMemoryExecutionStrategy. For dictionary configurations like {"type": "cs", "server_port": 9999}, verify that trainer.strategy becomes a ClientServerExecutionStrategy with server_port set to 9999.

Verifying Environment Variable Overrides

The client/server strategy respects environment variables AGL_SERVER_PORT, AGL_CURRENT_ROLE, and AGL_MANAGED_STORE, which take precedence over constructor arguments. Use monkeypatch or direct os.environ manipulation in your tests to confirm these overrides work correctly:

import os
import agentlightning as agl

os.environ["AGL_SERVER_PORT"] = "10000"
os.environ["AGL_CURRENT_ROLE"] = "algorithm"
os.environ["AGL_MANAGED_STORE"] = "0"

trainer = agl.Trainer(
    algorithm=agl.Baseline(),
    n_runners=8,
    strategy="cs",
)
assert trainer.strategy.server_port == 10000
assert trainer.strategy.role == "algorithm"
assert trainer.strategy.managed_store is False

Enforcing Runtime Contracts

Beyond initialization, the Trainer enforces contracts at runtime through the dev and fit methods. These contracts ensure that only compatible algorithm types are used with specific execution modes and that configuration parameters flow correctly to the execution strategy.

Testing the FastAlgorithm Requirement

The Trainer.dev method strictly requires a FastAlgorithm implementation. Passing a standard Algorithm should raise TypeError. Test this contract by creating a dummy algorithm class inheriting from Algorithm (not FastAlgorithm) and asserting that trainer.dev() raises the expected exception:

import pytest
import agentlightning as agl

class SlowAlgo(agl.Algorithm):
    def run(self, train_dataset=None, val_dataset=None):
        pass

trainer = agl.Trainer(strategy=agl.DummyStrategy(), algorithm=SlowAlgo())
with pytest.raises(TypeError):
    trainer.dev(agl.DummyAgent())

Validating Port Forwarding

When using client/server strategies, verify that port arguments passed to Trainer correctly propagate to the strategy instance. After initialization, check that trainer.strategy.server_port matches the value provided to the constructor, ensuring that Trainer.fit will bind to the correct endpoint.

Practical Implementation Examples

The following patterns demonstrate comprehensive testing approaches based on the actual test suite in tests/trainer/test_trainer_init.py and tests/trainer/test_trainer_dev.py.

Example 1: Verifying Tracer Injection

Confirm that the tracer provided to Trainer is correctly injected into the runner:

import agentlightning as agl

trainer = agl.Trainer(
    algorithm=agl.Baseline(),
    n_runners=8,
    tracer=agl.OtelTracer(),
)
assert isinstance(trainer.tracer, agl.OtelTracer)
assert isinstance(trainer.runner.tracer, agl.OtelTracer)

Example 2: Client/Server Strategy with Custom Port

Test dictionary-based strategy configuration with explicit port assignment:

trainer = agl.Trainer(
    algorithm=agl.Baseline(),
    n_runners=8,
    strategy={"type": "cs", "server_port": 9999},
)
assert isinstance(trainer.strategy, agl.ClientServerExecutionStrategy)
assert trainer.strategy.server_port == 9999

Example 3: Integration Sanity Check

Verify that the execution strategy's execute method is invoked during training by using a DummyStrategy that tracks calls:

class DummyStrategy(agl.DummyStrategy):
    def __init__(self):
        super().__init__()
        self.called = False
    
    def execute(self, runner, agent):
        self.called = True
        return super().execute(runner, agent)

strategy = DummyStrategy()
trainer = agl.Trainer(strategy=strategy, algorithm=agl.Baseline())
trainer.dev(agl.DummyAgent())
assert strategy.called

Summary

Test component wiring by instantiating Trainer with various specification types (instances, classes, callables, strings, dicts) and verifying the resolved attributes in agentlightning/trainer/trainer.py.
Validate strategy configuration by checking that string aliases and dictionary specifications correctly instantiate SharedMemoryExecutionStrategy or ClientServerExecutionStrategy with proper n_runners, port, and managed_store values.
Verify environment overrides by monkey-patching AGL_SERVER_PORT, AGL_CURRENT_ROLE, and AGL_MANAGED_STORE to ensure they take precedence over code-level configuration.
Enforce runtime contracts by asserting that Trainer.dev raises TypeError for non-FastAlgorithm implementations and that port arguments flow correctly to the strategy layer.
Reference helper methods including _make_tracer, _make_algorithm, _make_strategy, and _make_runner when designing unit tests to ensure each resolution path is covered independently.

Frequently Asked Questions

How do you test that the Trainer correctly resolves different component specifications?

Instantiate Trainer with mixed inputs: concrete objects like agl.OtelTracer(), class references like agl.Baseline, factory callables, string identifiers from the registry, and dictionaries with "type" keys. Then assert that trainer.tracer, trainer.algorithm, and other attributes are instances of the expected classes. This verifies that _make_* methods in agentlightning/trainer/trainer.py handle all specification formats correctly.

What is the best way to verify strategy configuration in Agent-Lightning?

Pass different strategy arguments to Trainer and inspect the resulting trainer.strategy attribute. For string aliases, check isinstance(trainer.strategy, agl.SharedMemoryExecutionStrategy) when using "shm". For dictionary specs, verify that {"type": "cs", "server_port": 9999} creates a ClientServerExecutionStrategy with server_port == 9999. Also test that n_runners and managed_store parameters propagate correctly.

How do you ensure that environment variables properly override Trainer configuration?

Use os.environ to set AGL_SERVER_PORT, AGL_CURRENT_ROLE, and AGL_MANAGED_STORE before instantiating Trainer with a client/server strategy. Then assert that trainer.strategy.server_port, role, and managed_store match the environment values rather than the constructor defaults. Remember to clean up environment variables in test teardown to avoid cross-test contamination.

Why does Trainer.dev raise a TypeError for some algorithms?

Trainer.dev strictly requires a FastAlgorithm implementation because the development execution mode relies on specific interface guarantees that base Algorithm classes do not provide. If you pass a standard Algorithm subclass that does not inherit from FastAlgorithm, the method raises TypeError to prevent runtime failures. Always verify algorithm types using isinstance(algorithm, agl.FastAlgorithm) or similar checks in your test assertions.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how microsoft/agent-lightning works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →