Multi-Agent Selective Optimization in Agent-Lightning: 5 Strategies for Targeted Training

Agent-Lightning enables selective optimization of specific agents within a multi-agent system through the agent_match regex filter, allowing targeted training loops without modifying underlying agent code.

Agent-Lightning, Microsoft's open-source framework for agentic AI training, provides native support for multi-agent selective optimization through intelligent trace filtering. Rather than training all agents simultaneously in a monolithic loop, you can isolate individual agents or groups using regular expression patterns applied to OpenTelemetry spans. This capability centers on the agent_match parameter found in the trace-to-triplet adapter, which recursively filters LLM calls and rewards by agent name before they reach the training algorithm.

How Selective Optimization Works

Agent-Lightning implements selective optimization by intercepting traces at the adapter level before they are converted into training triplets. Every LLM invocation, tool execution, and reward emission is captured as a span carrying metadata about which agent produced it.

The agent_match Filter

The core mechanism resides in agentlightning/adapter/triplet.py, where the TracerTraceToTriplet adapter accepts an agent_match argument. This regular expression is applied within TraceTree.find_llm_calls (lines 56-57) to recursively search the span tree. Only spans whose enclosing agent name matches the pattern are emitted as training data, enabling gradients to flow exclusively to the targeted agent's policy while leaving co-existing agents untouched.

Span Filtering and Reward Assignment

Because rewards are also attached to spans, the same filter applies when computing final rewards for a rollout. This permits agent-specific reinforcement-learning updates even when multiple agents share the same environment and dataset. If agent_match is omitted, the adapter defaults to processing spans from all agents.

Hierarchy Repair for Complex Frameworks

Multi-agent frameworks like AutoGen, LangChain, or CrewAI sometimes emit spans that are not properly nested under their correct agent root. The TraceTree.repair_hierarchy method (lines 78-88 in agentlightning/adapter/triplet.py) re-parents these misplaced spans before filtering occurs, ensuring the agent_match regex reliably captures all relevant data for the target agent.

Optimization Strategies

Single-Pass Selective Optimization

Use this strategy when one agent requires improvement while others must remain frozen, such as training a primary actor while keeping a critic fixed. Configure the adapter with a specific agent name:

import agentlightning as agl

trainer = agl.Trainer(
    algorithm=algo,
    n_runners=4,
    adapter={"agent_match": "primary"},  # regex matching the primary agent name

)

This instantiation forwards the adapter configuration to TraceToTriplet, which filters out all spans except those from the "primary" agent.

Parallel Optimization of Multiple Agents

When two or more agents each require distinct policy updates (e.g., a planner and an executor), launch separate Trainer instances with distinct agent_match patterns and execute them concurrently:

import asyncio
import agentlightning as agl

trainer_planner = agl.Trainer(
    algorithm=algo_planner,
    n_runners=2,
    adapter={"agent_match": "planner"},
)

trainer_executor = agl.Trainer(
    algorithm=algo_executor,
    n_runners=2,
    adapter={"agent_match": "executor"},
)

await asyncio.gather(trainer_planner.run(), trainer_executor.run())

Each trainer consumes only the spans belonging to its respective agent, computing separate gradients and policy updates in parallel.

Regex-Based Agent Grouping

For agents sharing a naming convention, use pattern matching to optimize entire cohorts simultaneously. This is effective when training multiple worker agents prefixed identically:

adapter={"agent_match": "worker-.*"}

The regular expression matches any agent name starting with "worker-", aggregating their spans into a single training stream without listing each agent individually.

Dynamic Selection at Runtime

When agents are spawned dynamically (e.g., during auto-scaling or adaptive workflows), compute the regex pattern before each training epoch based on discovered agent names:

active_agents = discover_agents()  # your discovery logic

pattern = f"({'|'.join(active_agents)})"

trainer = agl.Trainer(
    algorithm=algo,
    adapter={"agent_match": pattern},
)

This ensures the optimization loop adapts to the current system composition without hardcoding agent identities.

Reward-Only Optimization for Specific Agents

In configurations where certain agents (like critics) provide feedback but should not receive direct rewards, apply the agent_match filter during reward extraction. This occurs in the reward computation phase (e.g., find_final_reward), filtering the reward signal to include only specific agent contributions while still logging all agent actions.

Practical Implementation Examples

Training a Primary Agent with a Fixed Critic

import agentlightning as agl
from agentlightning.algorithm import PPO

algo = PPO(...)

# Only the primary agent receives updates; critic remains frozen

trainer = agl.Trainer(
    algorithm=algo,
    n_runners=4,
    adapter={"agent_match": "primary"},
)

await trainer.run()

Repairing Hierarchy Before Filtering

When integrating with AutoGen or similar frameworks, call repair_hierarchy immediately after trace collection to ensure proper span nesting:

trace_tree = await capture_trace()
trace_tree.repair_hierarchy()  # Lines 78-88 in triplet.py

# Now filtering works reliably

adapter={"agent_match": "agent-1"}

Aggregating Metrics Across Selective Runs

Use MultiMetricsBackend from agentlightning/utils/metrics.py to unify logging when running multiple selective trainers:

from agentlightning.utils import metrics

console = metrics.ConsoleMetricsBackend()
prom = metrics.PrometheusMetricsBackend(...)

multi_backend = metrics.MultiMetricsBackend([console, prom])

trainer = agl.Trainer(
    algorithm=algo,
    adapter={"agent_match": ".*"},  # track all agents

    tracker=multi_backend,
)

Summary

  • Use agent_match in the adapter configuration to filter OpenTelemetry spans by agent name using regular expressions.
  • Run parallel trainers with distinct agent_match patterns to optimize multiple agents concurrently with isolated gradients.
  • Call repair_hierarchy before filtering when using frameworks that produce disjoint span trees (AutoGen, LangChain, CrewAI).
  • Leverage regex patterns like "planner-.*" to train agent groups without explicit enumeration.
  • Aggregate metrics using MultiMetricsBackend to maintain observability across selective optimization runs.

Frequently Asked Questions

How does the agent_match filter work internally?

The filter operates during the adapter's span processing phase. In agentlightning/adapter/triplet.py, the TraceTree.find_llm_calls method (lines 56-57) recursively traverses the span tree and yields only those calls where the agent name attribute matches the provided regex. This filtered subset is then converted into training triplets, ensuring the RL algorithm receives data exclusively from the matched agent.

Why must I repair span hierarchy before filtering agents?

Frameworks like AutoGen or CrewAI sometimes emit spans that are not properly nested under their logical agent root, causing the recursive search in find_llm_calls to miss valid agent data. The TraceTree.repair_hierarchy method (lines 78-88) re-parents these orphaned spans, ensuring that the agent_match regex correctly identifies all LLM calls and rewards associated with the target agent.

Can I train multiple agents simultaneously with different algorithms?

Yes. Create separate Trainer instances for each agent, each configured with its own agent_match pattern and algorithm instance. Use asyncio.gather() to run them concurrently. Each trainer maintains its own policy and optimizer state, allowing you to mix algorithms (e.g., PPO for the planner, REINFORCE for the executor) within the same multi-agent environment.

How do I handle metrics when selectively optimizing only certain agents?

Initialize a MultiMetricsBackend from agentlightning/utils/metrics.py and pass it to each trainer via the tracker parameter. This backend fans out metric events to multiple destinations (console, Prometheus, etc.), ensuring you capture performance data for all agents even when only a subset is actively training. You can also configure one trainer with agent_match=".*" solely for aggregated logging while others handle specific optimization tasks.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →