Multi-Agent Selective Optimization in Agent-Lightning: 5 Strategies for Targeted Training
Agent-Lightning enables selective optimization of specific agents within a multi-agent system through the agent_match regex filter, allowing targeted training loops without modifying underlying agent code.
Agent-Lightning, Microsoft's open-source framework for agentic AI training, provides native support for multi-agent selective optimization through intelligent trace filtering. Rather than training all agents simultaneously in a monolithic loop, you can isolate individual agents or groups using regular expression patterns applied to OpenTelemetry spans. This capability centers on the agent_match parameter found in the trace-to-triplet adapter, which recursively filters LLM calls and rewards by agent name before they reach the training algorithm.
How Selective Optimization Works
Agent-Lightning implements selective optimization by intercepting traces at the adapter level before they are converted into training triplets. Every LLM invocation, tool execution, and reward emission is captured as a span carrying metadata about which agent produced it.
The agent_match Filter
The core mechanism resides in agentlightning/adapter/triplet.py, where the TracerTraceToTriplet adapter accepts an agent_match argument. This regular expression is applied within TraceTree.find_llm_calls (lines 56-57) to recursively search the span tree. Only spans whose enclosing agent name matches the pattern are emitted as training data, enabling gradients to flow exclusively to the targeted agent's policy while leaving co-existing agents untouched.
Span Filtering and Reward Assignment
Because rewards are also attached to spans, the same filter applies when computing final rewards for a rollout. This permits agent-specific reinforcement-learning updates even when multiple agents share the same environment and dataset. If agent_match is omitted, the adapter defaults to processing spans from all agents.
Hierarchy Repair for Complex Frameworks
Multi-agent frameworks like AutoGen, LangChain, or CrewAI sometimes emit spans that are not properly nested under their correct agent root. The TraceTree.repair_hierarchy method (lines 78-88 in agentlightning/adapter/triplet.py) re-parents these misplaced spans before filtering occurs, ensuring the agent_match regex reliably captures all relevant data for the target agent.
Optimization Strategies
Single-Pass Selective Optimization
Use this strategy when one agent requires improvement while others must remain frozen, such as training a primary actor while keeping a critic fixed. Configure the adapter with a specific agent name:
import agentlightning as agl
trainer = agl.Trainer(
algorithm=algo,
n_runners=4,
adapter={"agent_match": "primary"}, # regex matching the primary agent name
)
This instantiation forwards the adapter configuration to TraceToTriplet, which filters out all spans except those from the "primary" agent.
Parallel Optimization of Multiple Agents
When two or more agents each require distinct policy updates (e.g., a planner and an executor), launch separate Trainer instances with distinct agent_match patterns and execute them concurrently:
import asyncio
import agentlightning as agl
trainer_planner = agl.Trainer(
algorithm=algo_planner,
n_runners=2,
adapter={"agent_match": "planner"},
)
trainer_executor = agl.Trainer(
algorithm=algo_executor,
n_runners=2,
adapter={"agent_match": "executor"},
)
await asyncio.gather(trainer_planner.run(), trainer_executor.run())
Each trainer consumes only the spans belonging to its respective agent, computing separate gradients and policy updates in parallel.
Regex-Based Agent Grouping
For agents sharing a naming convention, use pattern matching to optimize entire cohorts simultaneously. This is effective when training multiple worker agents prefixed identically:
adapter={"agent_match": "worker-.*"}
The regular expression matches any agent name starting with "worker-", aggregating their spans into a single training stream without listing each agent individually.
Dynamic Selection at Runtime
When agents are spawned dynamically (e.g., during auto-scaling or adaptive workflows), compute the regex pattern before each training epoch based on discovered agent names:
active_agents = discover_agents() # your discovery logic
pattern = f"({'|'.join(active_agents)})"
trainer = agl.Trainer(
algorithm=algo,
adapter={"agent_match": pattern},
)
This ensures the optimization loop adapts to the current system composition without hardcoding agent identities.
Reward-Only Optimization for Specific Agents
In configurations where certain agents (like critics) provide feedback but should not receive direct rewards, apply the agent_match filter during reward extraction. This occurs in the reward computation phase (e.g., find_final_reward), filtering the reward signal to include only specific agent contributions while still logging all agent actions.
Practical Implementation Examples
Training a Primary Agent with a Fixed Critic
import agentlightning as agl
from agentlightning.algorithm import PPO
algo = PPO(...)
# Only the primary agent receives updates; critic remains frozen
trainer = agl.Trainer(
algorithm=algo,
n_runners=4,
adapter={"agent_match": "primary"},
)
await trainer.run()
Repairing Hierarchy Before Filtering
When integrating with AutoGen or similar frameworks, call repair_hierarchy immediately after trace collection to ensure proper span nesting:
trace_tree = await capture_trace()
trace_tree.repair_hierarchy() # Lines 78-88 in triplet.py
# Now filtering works reliably
adapter={"agent_match": "agent-1"}
Aggregating Metrics Across Selective Runs
Use MultiMetricsBackend from agentlightning/utils/metrics.py to unify logging when running multiple selective trainers:
from agentlightning.utils import metrics
console = metrics.ConsoleMetricsBackend()
prom = metrics.PrometheusMetricsBackend(...)
multi_backend = metrics.MultiMetricsBackend([console, prom])
trainer = agl.Trainer(
algorithm=algo,
adapter={"agent_match": ".*"}, # track all agents
tracker=multi_backend,
)
Summary
- Use
agent_matchin the adapter configuration to filter OpenTelemetry spans by agent name using regular expressions. - Run parallel trainers with distinct
agent_matchpatterns to optimize multiple agents concurrently with isolated gradients. - Call
repair_hierarchybefore filtering when using frameworks that produce disjoint span trees (AutoGen, LangChain, CrewAI). - Leverage regex patterns like
"planner-.*"to train agent groups without explicit enumeration. - Aggregate metrics using
MultiMetricsBackendto maintain observability across selective optimization runs.
Frequently Asked Questions
How does the agent_match filter work internally?
The filter operates during the adapter's span processing phase. In agentlightning/adapter/triplet.py, the TraceTree.find_llm_calls method (lines 56-57) recursively traverses the span tree and yields only those calls where the agent name attribute matches the provided regex. This filtered subset is then converted into training triplets, ensuring the RL algorithm receives data exclusively from the matched agent.
Why must I repair span hierarchy before filtering agents?
Frameworks like AutoGen or CrewAI sometimes emit spans that are not properly nested under their logical agent root, causing the recursive search in find_llm_calls to miss valid agent data. The TraceTree.repair_hierarchy method (lines 78-88) re-parents these orphaned spans, ensuring that the agent_match regex correctly identifies all LLM calls and rewards associated with the target agent.
Can I train multiple agents simultaneously with different algorithms?
Yes. Create separate Trainer instances for each agent, each configured with its own agent_match pattern and algorithm instance. Use asyncio.gather() to run them concurrently. Each trainer maintains its own policy and optimizer state, allowing you to mix algorithms (e.g., PPO for the planner, REINFORCE for the executor) within the same multi-agent environment.
How do I handle metrics when selectively optimizing only certain agents?
Initialize a MultiMetricsBackend from agentlightning/utils/metrics.py and pass it to each trainer via the tracker parameter. This backend fans out metric events to multiple destinations (console, Prometheus, etc.), ensuring you capture performance data for all agents even when only a subset is actively training. You can also configure one trainer with agent_match=".*" solely for aggregated logging while others handle specific optimization tasks.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →