# Configuring Environment Variables for Agent-Lightning Production Deployments

> Configure environment variables for agent-lightning production deployments to manage GPU and CPU nodes and connect to persistent stores like MongoDB. Learn how in our guide.

- Repository: [Microsoft/agent-lightning](https://github.com/microsoft/agent-lightning)
- Tags: how-to-guide
- Published: 2026-04-01

---

**Agent-Lightning uses environment variables defined in [`agentlightning/env_var.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/env_var.py) to separate GPU-heavy algorithm nodes from CPU-heavy rollout runners and connect to external persistent stores like MongoDB in production deployments.**

Microsoft's Agent-Lightning framework relies on a concise set of environment variables to wire together training algorithms, rollout runners, and trace stores. In production deployments, you must explicitly configure these variables to disable the default in-memory store, assign specific roles to different compute nodes, and enable observability through OpenTelemetry endpoints.

## Core Environment Variables Defined in [`agentlightning/env_var.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/env_var.py)

The library centralizes all environment variable definitions in [`agentlightning/env_var.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/env_var.py). These values control whether processes spawn internal services or connect to external infrastructure.

### Store Management: `AGL_MANAGED_STORE` and `AGENT_LIGHTNING_STORE_URL`

The `AGL_MANAGED_STORE` variable determines whether the library automatically starts an internal store or expects an external one. Set `AGL_MANAGED_STORE=0` to disable the automatic `InMemoryLightningStore` wrapper and instead point to a durable backend like `MongoLightningStore`.

When using an external store, specify the full HTTP URL via `AGENT_LIGHTNING_STORE_URL`:

```bash
export AGL_MANAGED_STORE=0
export AGENT_LIGHTNING_STORE_URL="http://mongo-store:4747"

```

As implemented in [`agentlightning/execution/client_server.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/execution/client_server.py), the `LightningStoreClient` uses this URL to establish HTTP connections to the store's REST API endpoints defined under `API_AGL_PREFIX` in [`agentlightning/store/client_server.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/store/client_server.py).

### Role-Based Process Separation: `AGL_CURRENT_ROLE`

Use `AGL_CURRENT_ROLE` to designate whether a process runs the training algorithm, the rollout workers, or both. Valid values are `algorithm`, `runner`, or `both`.

- **Algorithm nodes** (typically GPU-heavy): Set `AGL_CURRENT_ROLE=algorithm`
- **Runner nodes** (typically CPU-heavy): Set `AGL_CURRENT_ROLE=runner`

The `SharedMemoryExecutionStrategy` in [`agentlightning/execution/shared_memory.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/execution/shared_memory.py) inspects this variable along with `AGL_MANAGED_STORE` to determine whether to spawn a `LightningStoreThreaded` wrapper locally or connect to a remote store.

### Networking Configuration: `AGL_SERVER_HOST` and `AGL_SERVER_PORT`

These variables define the host and port where the store server listens. Runners and algorithms use these values to locate the store when `AGL_MANAGED_STORE=0`.

```bash
export AGL_SERVER_HOST=0.0.0.0
export AGL_SERVER_PORT=4747

```

The defaults are defined in [`agentlightning/env_var.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/env_var.py) (lines 35-39), where `AGL_SERVER_PORT` defaults to `4747` if unspecified.

### Observability: `AGENT_LIGHTNING_OTLP_ENDPOINT` and `AGL_EMITTER_DEBUG`

For production monitoring, configure `AGENT_LIGHTNING_OTLP_ENDPOINT` to export traces to an OpenTelemetry collector:

```bash
export AGENT_LIGHTNING_OTLP_ENDPOINT="http://otel-collector:4317"

```

The [`agentlightning/utils/otel.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/utils/otel.py) module reads this endpoint (line 80-81) to configure span exporters. Set `AGL_EMITTER_DEBUG=1` to enable verbose logging of every emitted span during pipeline debugging.

## Production Architecture Patterns

Configuring these variables enables a distributed architecture where GPU nodes run the policy algorithm while CPU clusters handle rollouts, all persisting data to MongoDB.

```

┌─────────────────────┐      ┌───────────────────────────────┐
│  Algorithm (GPU)    │      │  Runners (CPU) – many instances │
│  AGL_CURRENT_ROLE=algorithm │  AGL_CURRENT_ROLE=runner          │
│  AGL_MANAGED_STORE=0 │      │  AGL_MANAGED_STORE=0            │
└─────────┬───────────┘      └───────────────┬─────────────────┘
          │  HTTP (AGL API)                    │
          ▼                                    ▼
    ┌─────────────────────┐          ┌───────────────────────┐
    │  MongoLightningStore │ ←─────── │  LightningStoreClient   │
    │  (persistent)       │          │  (external URL)         │
    └─────────────────────┘          └───────────────────────┘

```

This pattern appears in the official WebShop recipe at [`contrib/recipes/webshop/scripts/run_stack.sh`](https://github.com/microsoft/agent-lightning/blob/main/contrib/recipes/webshop/scripts/run_stack.sh), which demonstrates production-grade variable configuration.

## Implementation Examples

### Bash Launch Scripts for Cluster Deployment

The following pattern from [`contrib/recipes/webshop/scripts/run_stack.sh`](https://github.com/microsoft/agent-lightning/blob/main/contrib/recipes/webshop/scripts/run_stack.sh) shows how to configure nodes in a production cluster:

**Store and Algorithm Node (GPU):**

```bash

# External store configuration

export AGL_MANAGED_STORE=0
export AGENT_LIGHTNING_STORE_URL="http://mongo-store:4747"
export AGENT_LIGHTNING_OTLP_ENDPOINT="http://otel-collector:4317"

# Algorithm role with networking

export AGL_CURRENT_ROLE=algorithm
export AGL_SERVER_HOST=0.0.0.0
export AGL_SERVER_PORT=4747
export AGL_EMITTER_DEBUG=1

python train_my_agent.py --external-store-address "$AGENT_LIGHTNING_STORE_URL"

```

**Runner Nodes (CPU):**

```bash
export AGL_CURRENT_ROLE=runner
export AGL_SERVER_HOST=store-host.example.com
export AGL_SERVER_PORT=4747
export AGL_MANAGED_STORE=0
export AGENT_LIGHTNING_STORE_URL="http://mongo-store:4747"

python run_rollouts.py --external-store-address "$AGENT_LIGHTNING_STORE_URL"

```

### Docker Configuration for Production Containers

When containerizing Agent-Lightning, preset environment variables in the Dockerfile to ensure consistent production behavior:

```dockerfile
FROM python:3.12-slim

RUN pip install "agentlightning[verl]" pymongo opentelemetry-sdk

ENV AGL_MANAGED_STORE=0 \
    AGL_CURRENT_ROLE=algorithm \
    AGL_SERVER_HOST=0.0.0.0 \
    AGL_SERVER_PORT=4747 \
    AGENT_LIGHTNING_STORE_URL="http://mongo-store:4747" \
    AGENT_LIGHTNING_OTLP_ENDPOINT="http://otel-collector:4317"

COPY . /app
WORKDIR /app

CMD ["python", "train_my_agent.py"]

```

Reference the complete example in `contrib/recipes/webshop/Dockerfile` for additional production optimizations.

### Python Environment Resolution

The framework provides helper functions in [`agentlightning/utils/env.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/utils/env.py) to safely resolve environment variables with type conversion and fallbacks:

```python
from agentlightning.env_var import LightningEnvVar
from agentlightning.utils.env import (
    resolve_bool_env_var,
    resolve_str_env_var,
    resolve_int_env_var,
)

# Determine if we should manage an internal store

use_managed = resolve_bool_env_var(
    LightningEnvVar.AGL_MANAGED_STORE,
    fallback=True,
)

# Construct store connection URL

store_url = (
    resolve_str_env_var(LightningEnvVar.AGL_SERVER_HOST, fallback="localhost")
    + f":{resolve_int_env_var(LightningEnvVar.AGL_SERVER_PORT, fallback=4747)}"
)

```

These helpers are invoked throughout [`client_server.py`](https://github.com/microsoft/agent-lightning/blob/main/client_server.py) and [`shared_memory.py`](https://github.com/microsoft/agent-lightning/blob/main/shared_memory.py) to parse configuration at runtime.

## Summary

- **Set `AGL_MANAGED_STORE=0`** in production to disable the in-memory store and use external persistence like MongoDB.
- **Assign `AGL_CURRENT_ROLE`** as either `algorithm` (GPU nodes) or `runner` (CPU nodes) to separate compute concerns.
- **Configure `AGL_SERVER_HOST`, `AGL_SERVER_PORT`, and `AGENT_LIGHTNING_STORE_URL`** to ensure all components communicate over HTTP to the same store endpoint.
- **Enable observability** by setting `AGENT_LIGHTNING_OTLP_ENDPOINT` for trace export and optionally `AGL_EMITTER_DEBUG=1` for verbose logging.
- **Reference [`agentlightning/env_var.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/env_var.py)** for the canonical list of all supported environment variables and their default values.

## Frequently Asked Questions

### How do I switch from the default in-memory store to MongoDB in production?

Set `AGL_MANAGED_STORE=0` to prevent Agent-Lightning from automatically starting an internal store. Then configure `AGENT_LIGHTNING_STORE_URL` to point to your MongoDB instance's HTTP endpoint (e.g., `http://mongo-host:4747`). The `LightningStoreClient` will connect to this URL instead of spawning a local `InMemoryLightningStore`.

### Can I run the algorithm and runners on the same machine?

Yes, by setting `AGL_CURRENT_ROLE=both` you can run both components in a single process. However, for production deployments requiring GPU resources for training and CPU resources for rollouts, Microsoft recommends separating these roles onto different nodes using `AGL_CURRENT_ROLE=algorithm` and `AGL_CURRENT_ROLE=runner` respectively.

### Why is `AGL_SERVER_HOST` set to `0.0.0.0` on algorithm nodes but a specific hostname on runners?

Algorithm nodes typically host the store server (unless using a completely external database), so `0.0.0.0` allows them to accept connections from any network interface. Runner nodes act as clients connecting to that store, so they need the specific hostname or IP address where the store is reachable (e.g., `store-host.example.com`).

### How do I enable debug logging for trace emissions in production?

Set the environment variable `AGL_EMITTER_DEBUG=1` before starting your process. This instructs the OpenTelemetry utilities in [`agentlightning/utils/otel.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/utils/otel.py) to log every span at the debug level, which is useful for troubleshooting production tracing issues without modifying source code.