# Understanding Execution Modes in Agent-Lightning: Shared Memory, Inter-Process, and Client-Server

> Explore Agent-Lightning execution modes: shared-memory, client-server, and full orchestration. Master process topology, communication, and shutdown semantics for efficient agent deployment.

- Repository: [Microsoft/agent-lightning](https://github.com/microsoft/agent-lightning)
- Tags: deep-dive
- Published: 2026-04-01

---

**Agent-Lightning supports three distinct execution modes—shared-memory for single-process threading, client-server for multi-process distribution, and full orchestration—each controlled by strategy classes that define process topology, communication transport, and shutdown semantics.**

The microsoft/agent-lightning repository provides flexible execution strategies that determine how algorithm bundles and runner bundles interact during distributed training. Understanding these **execution modes** is essential for optimizing performance, debugging distributed logic, and deploying to multi-GPU or cluster environments.

## Shared-Memory Execution

**Shared-memory execution** runs all bundles within a single Python process using cooperative worker threads. This mode eliminates serialization overhead and is ideal for fast prototyping and debugging.

According to the source code in [`agentlightning/execution/shared_memory.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/execution/shared_memory.py), the `SharedMemoryExecutionStrategy` manages concurrency through direct object references while ensuring thread safety via the `LightningStoreThreaded` wrapper.

### Thread Safety and Synchronization

When `managed_store=True` (the default), the strategy wraps the original `LightningStore` in a `LightningStoreThreaded` instance from [`agentlightning/store/threading.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/store/threading.py). This wrapper synchronizes read/write operations across concurrent threads.

The strategy uses a single `ThreadingEvent` (`stop_evt`) shared by all bundles for cooperative shutdown. When you press Ctrl-C or any bundle crashes, the event triggers, initiating a graceful exit sequence governed by the `graceful_delay` parameter.

### Main Thread Configuration

The `main_thread` parameter determines which component occupies the main thread:

- **`main_thread="algorithm"`** (default): Runs the algorithm on the main thread and executes runners on background threads.
- **`main_thread="runner"`**: Executes the runner on the main thread (requires `n_runners=1`), useful for breakpoint debugging in IDEs.

```python
from agentlightning.execution.shared_memory import SharedMemoryExecutionStrategy
from agentlightning.trainer import Trainer

# Algorithm on main thread, one background runner

strategy = SharedMemoryExecutionStrategy(
    n_runners=1,
    main_thread="algorithm",
    graceful_delay=5.0,
)

trainer = Trainer(strategy=strategy)
trainer.run()  # Blocks until completion or interruption

```

## Client-Server Execution

**Client-server execution** isolates the algorithm and runners into separate processes communicating via HTTP. This mode, implemented in [`agentlightning/execution/client_server.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/execution/client_server.py), bypasses Python's Global Interpreter Lock (GIL) and supports multi-GPU deployments.

The `ClientServerExecutionStrategy` spawns a `LightningStoreServer` within the algorithm process and connects runners via `LightningStoreClient` instances over `http://host:port`. Process coordination uses a `MultiprocessingEvent` for cross-process signaling.

### Role-Based Architecture

The strategy supports three distinct roles via the `role` parameter:

- **`role="algorithm"`**: Launches only the HTTP server and algorithm; expects external runners to connect.
- **`role="runner"`**: Connects to an existing server at `server_host`/`server_port`; runs only runner logic.
- **`role="both"`**: Orchestrates a complete local setup, spawning the server and runner subprocesses simultaneously.

When `role="both"`, the `main_process` parameter designates which component runs in the main process:
- **`main_process="algorithm"`**: Main process hosts the algorithm and HTTP server; spawns runner subprocesses.
- **`main_process="runner"`**: Main process runs the runner (requires `n_runners=1`); spawns the algorithm as a subprocess.

### Shutdown Escalation

The client-server mode implements a rigorous four-step shutdown escalation to prevent zombie processes:

1. **Cooperative stop**: Signal via `MultiprocessingEvent`.
2. **SIGINT**: Send interrupt signal to subprocesses.
3. **terminate()**: Force termination after `graceful_timeout`.
4. **kill()**: Hard kill after `terminate_timeout`.

```python
from agentlightning.execution.client_server import ClientServerExecutionStrategy
from agentlightning.trainer import Trainer

# Full local orchestration with algorithm as main process

strategy = ClientServerExecutionStrategy(
    role="both",
    main_process="algorithm",
    n_runners=3,
    server_port=4747,
    graceful_timeout=8.0,
    terminate_timeout=5.0,
    managed_store=True,  # Automatic server/client wrappers

)

trainer = Trainer(strategy=strategy)
trainer.run()

```

### Connecting to External Servers

For cluster deployments, run runners in isolation pointing to remote algorithm servers:

```python
strategy = ClientServerExecutionStrategy(
    role="runner",
    server_host="10.0.0.5",
    server_port=4747,
    n_runners=2,
    managed_store=False,  # Provide custom LightningStoreClient if needed

)

trainer = Trainer(strategy=strategy)
trainer.run()

```

## Choosing Between Execution Modes

Select the appropriate strategy based on your debugging, resource, and deployment constraints:

- **Fast prototyping or debugging**: Use **Shared-Memory** (`main_thread="runner"`) for immediate state access and easy breakpoint insertion.
- **Large models with GPU contention**: Use **Client-Server** with `role="runner"` to isolate GPU memory across processes.
- **Single-machine multi-GPU training**: Use **Client-Server** with `role="both"` to orchestrate process isolation while maintaining local coordination.
- **Cluster or service-based deployments**: Use **Client-Server** with `role="runner"` and specify remote `server_host`/`server_port`.

Both `SharedMemoryExecutionStrategy` and `ClientServerExecutionStrategy` expose the identical public API `execute(algorithm, runner, store)`, ensuring seamless interchangeability when switching execution contexts.

## Summary

- **Shared-memory mode** ([`agentlightning/execution/shared_memory.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/execution/shared_memory.py)) executes bundles in a single process using threads, synchronized via `LightningStoreThreaded` and controlled by `ThreadingEvent`.
- **Client-server mode** ([`agentlightning/execution/client_server.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/execution/client_server.py)) distributes bundles across processes using HTTP transport, supporting `role="algorithm"`, `"runner"`, or `"both"` configurations.
- **Shutdown semantics** differ by mode: shared-memory uses cooperative cancellation with `graceful_delay`, while client-server implements a four-step escalation (SIGINT → terminate → kill).
- **Thread safety** in shared-memory relies on the `LightningStoreThreaded` wrapper, whereas client-server performs serialization over HTTP.
- Both strategies integrate with the `Trainer` class in [`agentlightning/trainer/trainer.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/trainer/trainer.py) through the common `ExecutionStrategy` interface.

## Frequently Asked Questions

### What is the difference between main_thread and main_process parameters?

The `main_thread` parameter exists only in `SharedMemoryExecutionStrategy` and determines whether the algorithm or runner occupies the main thread within a single process. The `main_process` parameter exists only in `ClientServerExecutionStrategy` when `role="both"`, determining whether the algorithm or runner runs in the parent process while the other spawns as a subprocess. Both parameters affect debugging accessibility and signal handling behavior.

### How does Agent-Lightning handle thread safety in shared-memory mode?

According to [`agentlightning/store/threading.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/store/threading.py), the framework wraps the `LightningStore` in a `LightningStoreThreaded` instance when `managed_store=True`. This wrapper provides thread-safe read/write access to the store's underlying data, preventing race conditions when the algorithm and multiple runners access shared state concurrently from different threads.

### Can I mix shared-memory and client-server execution in the same training run?

No, the execution mode is mutually exclusive per `Trainer` instance. You must choose either `SharedMemoryExecutionStrategy` or `ClientServerExecutionStrategy` when constructing the `Trainer`. However, you can run independent experiments using different strategies and share data between them by serializing checkpoints through the `LightningStore` interface.

### What happens if a runner crashes in client-server mode?

The `ClientServerExecutionStrategy` monitors subprocess health through `MultiprocessingEvent` and process polling. If a runner crashes, the strategy initiates the shutdown escalation sequence: first attempting cooperative shutdown, then issuing SIGINT, followed by `terminate()` after `graceful_timeout`, and finally `kill()` after `terminate_timeout`. This ensures resources are released even when runners exit unexpectedly.