Migrating from Legacy Runner to the New Architecture in Agent-Lightning: A Complete Guide

Replace LegacyAgentRunner with LitAgentRunner and switch from HTTP client polling to the LightningStore abstraction to unlock async-native execution, automatic heartbeats, and unified storage.

This guide covers the migration path from the deprecated LegacyAgentRunner to the modern LitAgentRunner architecture in the microsoft/agent-lightning repository. The new store-based design replaces the legacy v0.1 client-server API with an async-first execution model that powers all current examples and the Trainer implementation.

Architectural Differences: LegacyAgentRunner vs LitAgentRunner

The LegacyAgentRunner (Defined in agentlightning/runner/legacy.py) relies on direct polling of an AgentLightningClient via client.poll_next_task. It requires manual heartbeat implementation and uses the private _trace_context_sync method for tracing, converting results into RolloutLegacy objects via _to_rollout_object.

The modern LitAgentRunner (Implemented in agentlightning/runner/agent.py) pulls tasks from a LightningStore using store.dequeue_rollout. It provides built-in heartbeat loops through _start_heartbeat_thread_loop, supports both async trace_context and sync _trace_context_sync tracing, and normalizes results via _post_process_rollout_result before writing to the store.

Key architectural shifts include:

  • Entry point migration: From AgentLightningClient polling to LightningStore.dequeue_rollout as defined in agentlightning/store/base.py
  • Heartbeat automation: LitAgentRunner automatically snapshots system state at configurable intervals (default 10 seconds) instead of requiring manual client pings
  • Worker initialization: init_worker now receives a LightningStore instance and registers the worker, rather than simply storing a worker_id
  • Result standardization: Returns are processed through _post_process_rollout_result to handle float rewards, Span objects, and SpanCoreFields before storage
  • Hook pipeline: Rich async-aware hooks (on_trace_start, on_trace_end, on_rollout_start, on_rollout_end) with isolated error handling replace the limited legacy hook system

Step-by-Step Migration Checklist

Follow these specific actions to migrate your codebase, referencing the exact source locations in the repository:

  1. Update imports in agentlightning/runner/__init__.py — Replace LegacyAgentRunner with LitAgentRunner in your import statements.

  2. Refactor trainer instantiation — Modify custom trainer code in agentlightning/trainer/trainer.py to remove manual LegacyAgentRunner construction. The Trainer class now automatically instantiates LitAgentRunner via instantiate_component.

  3. Switch to store-based task retrieval — Replace AgentLightningClient polling logic with a LightningStore implementation such as InMemoryLightningStore or a remote store backend from agentlightning/store/base.py.

  4. Migrate result handling — Convert custom result processing to conform to _post_process_rollout_result in agentlightning/runner/agent.py. Return values must be a float, a list of Span/SpanCoreFields objects, or None.

  5. Enable automatic heartbeats — Remove manual heartbeat logic and configure heartbeat_interval arguments (default 10 seconds) when constructing LitAgentRunner. The runner automatically invokes _start_heartbeat_thread_loop.

  6. Update hook signatures — Adjust on_rollout_start and on_rollout_end hooks that expect RolloutLegacy models to accept the new Rollout model defined in agentlightning/types/core.py.

  7. Update references — Change documentation, examples, and tests to reference LitAgentRunner instead of the legacy class. See the reference implementation in examples/unsloth/sft_rollout_runners.py.

  8. Validate migration — Execute the full test suite using uv run pytest -v in the project root tests/ folder, specifically reviewing tests/runner/test_agent_runner.py for async and sync path compatibility.

Code Migration Examples

Minimal LitAgentRunner Setup

This example demonstrates the modern initialization pattern using the store-based architecture:

from agentlightning import LitAgentRunner, InMemoryLightningStore, AgentOpsTracer
from agentlightning.litagent import LitAgent

# Create a store (in-memory for demo)

store = InMemoryLightningStore()

# Instantiate a tracer (AgentOpsTracer is optional)

tracer = AgentOpsTracer()

# Build the runner – the store is injected later via init_worker

runner = LitAgentRunner[dict](tracer=tracer, max_rollouts=100)

# Initialize the runner with an agent (your custom LitAgent subclass)

runner.init(agent=MyAgent())

# Register the worker (worker_id = 0) and the store

runner.init_worker(worker_id=0, store=store)

# Run the async iteration loop inside an asyncio event loop

await runner.iter()

Source: examples/unsloth/sft_rollout_runners.py

Converting Trainer Initialization

Legacy approach (v0.1):

trainer = TrainerLegacy(...)
trainer.fit_v0(agent, train_data="http://localhost:8000")

Modern store-based approach:

from agentlightning.trainer import Trainer

trainer = Trainer(...)

# The trainer automatically instantiates LitAgentRunner via instantiate_component

trainer.fit(agent=MyAgent(), train_data=my_dataset)

Source: Trainer.fit_v0 (legacy) vs Trainer.fit in agentlightning/trainer/trainer.py

Updating Custom Hooks

The new runner supports an expanded async hook pipeline:

class MyHook:
    async def on_trace_start(self, *, agent, runner, tracer, rollout):
        print(f"Starting trace for rollout {rollout.rollout_id}")

    async def on_rollout_end(self, *, agent, runner, rollout, spans):
        # spans is a list of Span objects already stored

        print(f"Rollout {rollout.rollout_id} finished with {len(spans)} spans")
runner = LitAgentRunner[dict](tracer=tracer, max_rollouts=50)
runner.init(agent=my_agent, hooks=[MyHook()])

Source: Hook triggering in LitAgentRunner._trigger_hooks (lines 46-68 of agentlightning/runner/agent.py)

Key Source Files and Implementation Details

Understanding these core files ensures accurate migration:

Summary

Migrating from the legacy architecture to LitAgentRunner provides immediate benefits:

  • Unified storage API: All components share the LightningStore contract, simplifying scaling and persistence across distributed workers
  • Native async support: asyncio-first iter and step methods enable high-throughput streaming workloads without blocking
  • Automatic health monitoring: Built-in heartbeat loops with configurable intervals (default 10 seconds) maintain worker state visibility
  • Improved hook safety: Async-aware hooks are isolated from core runner loops with comprehensive error handling
  • Future compatibility: The legacy client-server path is slated for removal; new features like APO and AgentOps integration target only the store-based runner

Frequently Asked Questions

What happened to the legacy HTTP client polling?

The AgentLightningClient polling mechanism (client.poll_next_task) used by LegacyAgentRunner has been replaced by the LightningStore abstraction. According to the agent-lightning source code, the store provides dequeue_rollout for task retrieval and handles persistence transparently, eliminating the need for manual HTTP polling loops.

How do I handle custom result processing in the new runner?

Migrate logic from _to_rollout_object to _post_process_rollout_result. As implemented in agentlightning/runner/agent.py, the new method accepts rollout results and returns either a float reward, a list of Span/SpanCoreFields objects, or None. These values are automatically written to the LightningStore, standardizing result handling across the framework.

Is the legacy runner still supported?

LegacyAgentRunner remains available in agentlightning/runner/legacy.py for backward compatibility but is marked for removal. All new examples, tests, and the Trainer implementation in agentlightning/trainer/trainer.py exclusively use LitAgentRunner. Microsoft recommends immediate migration to ensure compatibility with upcoming features.

How does the heartbeat mechanism work in LitAgentRunner?

LitAgentRunner automatically initializes a heartbeat loop via _start_heartbeat_thread_loop (line 44 of agentlightning/runner/agent.py) when init_worker is called. The runner snapshots system state and updates the LightningStore at intervals specified by heartbeat_interval (defaulting to 10 seconds), eliminating the need for manual heartbeat implementations required by the legacy architecture.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →