# Multi-Agent Selective Optimization in Agent-Lightning: 5 Strategies for Targeted Training

> Discover multi-agent selective optimization strategies in Agent-Lightning. Learn to target specific agents with regex filters for efficient training without code changes. Visit microsoft/agent-lightning.

- Repository: [Microsoft/agent-lightning](https://github.com/microsoft/agent-lightning)
- Tags: how-to-guide
- Published: 2026-04-01

---

**Agent-Lightning enables selective optimization of specific agents within a multi-agent system through the `agent_match` regex filter, allowing targeted training loops without modifying underlying agent code.**

Agent-Lightning, Microsoft's open-source framework for agentic AI training, provides native support for **multi-agent selective optimization** through intelligent trace filtering. Rather than training all agents simultaneously in a monolithic loop, you can isolate individual agents or groups using regular expression patterns applied to OpenTelemetry spans. This capability centers on the `agent_match` parameter found in the trace-to-triplet adapter, which recursively filters LLM calls and rewards by agent name before they reach the training algorithm.

## How Selective Optimization Works

Agent-Lightning implements selective optimization by intercepting traces at the adapter level before they are converted into training triplets. Every LLM invocation, tool execution, and reward emission is captured as a span carrying metadata about which agent produced it.

### The agent_match Filter

The core mechanism resides in [`agentlightning/adapter/triplet.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/adapter/triplet.py), where the `TracerTraceToTriplet` adapter accepts an `agent_match` argument. This regular expression is applied within `TraceTree.find_llm_calls` (lines 56-57) to recursively search the span tree. Only spans whose enclosing agent name matches the pattern are emitted as training data, enabling **gradients to flow exclusively to the targeted agent's policy** while leaving co-existing agents untouched.

### Span Filtering and Reward Assignment

Because rewards are also attached to spans, the same filter applies when computing final rewards for a rollout. This permits agent-specific reinforcement-learning updates even when multiple agents share the same environment and dataset. If `agent_match` is omitted, the adapter defaults to processing spans from all agents.

### Hierarchy Repair for Complex Frameworks

Multi-agent frameworks like AutoGen, LangChain, or CrewAI sometimes emit spans that are not properly nested under their correct agent root. The `TraceTree.repair_hierarchy` method (lines 78-88 in [`agentlightning/adapter/triplet.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/adapter/triplet.py)) re-parents these misplaced spans before filtering occurs, ensuring the `agent_match` regex reliably captures all relevant data for the target agent.

## Optimization Strategies

### Single-Pass Selective Optimization

Use this strategy when one agent requires improvement while others must remain frozen, such as training a primary actor while keeping a critic fixed. Configure the adapter with a specific agent name:

```python
import agentlightning as agl

trainer = agl.Trainer(
    algorithm=algo,
    n_runners=4,
    adapter={"agent_match": "primary"},  # regex matching the primary agent name

)

```

This instantiation forwards the adapter configuration to `TraceToTriplet`, which filters out all spans except those from the "primary" agent.

### Parallel Optimization of Multiple Agents

When two or more agents each require distinct policy updates (e.g., a planner and an executor), launch separate `Trainer` instances with distinct `agent_match` patterns and execute them concurrently:

```python
import asyncio
import agentlightning as agl

trainer_planner = agl.Trainer(
    algorithm=algo_planner,
    n_runners=2,
    adapter={"agent_match": "planner"},
)

trainer_executor = agl.Trainer(
    algorithm=algo_executor,
    n_runners=2,
    adapter={"agent_match": "executor"},
)

await asyncio.gather(trainer_planner.run(), trainer_executor.run())

```

Each trainer consumes only the spans belonging to its respective agent, computing separate gradients and policy updates in parallel.

### Regex-Based Agent Grouping

For agents sharing a naming convention, use pattern matching to optimize entire cohorts simultaneously. This is effective when training multiple worker agents prefixed identically:

```python
adapter={"agent_match": "worker-.*"}

```

The regular expression matches any agent name starting with "worker-", aggregating their spans into a single training stream without listing each agent individually.

### Dynamic Selection at Runtime

When agents are spawned dynamically (e.g., during auto-scaling or adaptive workflows), compute the regex pattern before each training epoch based on discovered agent names:

```python
active_agents = discover_agents()  # your discovery logic

pattern = f"({'|'.join(active_agents)})"

trainer = agl.Trainer(
    algorithm=algo,
    adapter={"agent_match": pattern},
)

```

This ensures the optimization loop adapts to the current system composition without hardcoding agent identities.

### Reward-Only Optimization for Specific Agents

In configurations where certain agents (like critics) provide feedback but should not receive direct rewards, apply the `agent_match` filter during reward extraction. This occurs in the reward computation phase (e.g., `find_final_reward`), filtering the reward signal to include only specific agent contributions while still logging all agent actions.

## Practical Implementation Examples

### Training a Primary Agent with a Fixed Critic

```python
import agentlightning as agl
from agentlightning.algorithm import PPO

algo = PPO(...)

# Only the primary agent receives updates; critic remains frozen

trainer = agl.Trainer(
    algorithm=algo,
    n_runners=4,
    adapter={"agent_match": "primary"},
)

await trainer.run()

```

### Repairing Hierarchy Before Filtering

When integrating with AutoGen or similar frameworks, call `repair_hierarchy` immediately after trace collection to ensure proper span nesting:

```python
trace_tree = await capture_trace()
trace_tree.repair_hierarchy()  # Lines 78-88 in triplet.py

# Now filtering works reliably

adapter={"agent_match": "agent-1"}

```

### Aggregating Metrics Across Selective Runs

Use `MultiMetricsBackend` from [`agentlightning/utils/metrics.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/utils/metrics.py) to unify logging when running multiple selective trainers:

```python
from agentlightning.utils import metrics

console = metrics.ConsoleMetricsBackend()
prom = metrics.PrometheusMetricsBackend(...)

multi_backend = metrics.MultiMetricsBackend([console, prom])

trainer = agl.Trainer(
    algorithm=algo,
    adapter={"agent_match": ".*"},  # track all agents

    tracker=multi_backend,
)

```

## Summary

- **Use `agent_match`** in the adapter configuration to filter OpenTelemetry spans by agent name using regular expressions.
- **Run parallel trainers** with distinct `agent_match` patterns to optimize multiple agents concurrently with isolated gradients.
- **Call `repair_hierarchy`** before filtering when using frameworks that produce disjoint span trees (AutoGen, LangChain, CrewAI).
- **Leverage regex patterns** like `"planner-.*"` to train agent groups without explicit enumeration.
- **Aggregate metrics** using `MultiMetricsBackend` to maintain observability across selective optimization runs.

## Frequently Asked Questions

### How does the `agent_match` filter work internally?

The filter operates during the adapter's span processing phase. In [`agentlightning/adapter/triplet.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/adapter/triplet.py), the `TraceTree.find_llm_calls` method (lines 56-57) recursively traverses the span tree and yields only those calls where the agent name attribute matches the provided regex. This filtered subset is then converted into training triplets, ensuring the RL algorithm receives data exclusively from the matched agent.

### Why must I repair span hierarchy before filtering agents?

Frameworks like AutoGen or CrewAI sometimes emit spans that are not properly nested under their logical agent root, causing the recursive search in `find_llm_calls` to miss valid agent data. The `TraceTree.repair_hierarchy` method (lines 78-88) re-parents these orphaned spans, ensuring that the `agent_match` regex correctly identifies all LLM calls and rewards associated with the target agent.

### Can I train multiple agents simultaneously with different algorithms?

Yes. Create separate `Trainer` instances for each agent, each configured with its own `agent_match` pattern and algorithm instance. Use `asyncio.gather()` to run them concurrently. Each trainer maintains its own policy and optimizer state, allowing you to mix algorithms (e.g., PPO for the planner, REINFORCE for the executor) within the same multi-agent environment.

### How do I handle metrics when selectively optimizing only certain agents?

Initialize a `MultiMetricsBackend` from [`agentlightning/utils/metrics.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/utils/metrics.py) and pass it to each trainer via the `tracker` parameter. This backend fans out metric events to multiple destinations (console, Prometheus, etc.), ensuring you capture performance data for all agents even when only a subset is actively training. You can also configure one trainer with `agent_match=".*"` solely for aggregated logging while others handle specific optimization tasks.