# Security Considerations for the LLM Proxy in Agent-Lightning: Risks and Mitigations

> Secure your LLM proxy in agent-lightning. Learn essential security considerations, default risks, and mitigation strategies for your OpenAI-compatible API endpoint.

- Repository: [Microsoft/agent-lightning](https://github.com/microsoft/agent-lightning)
- Tags: security-considerations
- Published: 2026-04-01

---

**The LLM Proxy in Microsoft's agent-lightning framework ships without authentication, input validation, or rate limiting by default, requiring custom FastAPI middleware and deployment hardening to secure the OpenAI-compatible API endpoint.**

The LLM Proxy ([`agentlightning/llm_proxy.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/llm_proxy.py)) transforms Lightning stores into OpenAI-compatible REST APIs using FastAPI, making it a critical attack surface for production deployments. Because the proxy sits between untrusted clients and expensive LLM backends, understanding these security considerations is essential to prevent unauthorized access, data leakage, and prompt injection attacks. Below are the specific risks identified in the source code and practical mitigations you can implement today.

## Authentication and Authorization Risks

The proxy initializes a FastAPI application from `litellm.proxy.proxy_server` without any authentication middleware, creating immediate exposure to unauthorized access.

### Unauthenticated Access to LLM Endpoints

The `LLMProxy` class starts the API server in `initialize()` (lines 822-825) without enforcing API keys or tokens. Anyone with network access to the host and port can issue OpenAI-style requests, potentially exhausting quota or accessing sensitive model data.

**Mitigation:** Add a custom FastAPI authentication middleware as the first item in the middleware chain:

```python
from fastapi import Request, HTTPException
from starlette.middleware.base import BaseHTTPMiddleware

API_KEYS = {"secret-key-1", "secret-key-2"}

class AuthMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next):
        auth = request.headers.get("authorization")
        if not auth or auth.split("Bearer ")[-1] not in API_KEYS:
            raise HTTPException(status_code=401, detail="Invalid API key")
        return await call_next(request)

```

Register it via the `middlewares` parameter in `LLMProxy.__init__` (lines 1110-1122):

```python
from agentlightning.llm_proxy import LLMProxy

proxy = LLMProxy(
    middlewares=["rollout_attempt", "stream_conversion", AuthMiddleware],
    callbacks=["return_token_ids", "opentelemetry"],
)

```

### Header Injection via RolloutAttemptMiddleware

`RolloutAttemptMiddleware` injects `x-rollout-id`, `x-attempt-id`, and `x-sequence-id` headers into the request scope at lines 566-572. Because the middleware does not validate existing headers, malicious clients could spoof these internal tracking identifiers to manipulate attribution or bypass routing logic.

**Mitigation:** Add validation before the injection logic in [`agentlightning/llm_proxy.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/llm_proxy.py):

```python

# Inside RolloutAttemptMiddleware.dispatch (around line 545)

if any(h in request.headers for h in (b"x-rollout-id", b"x-attempt-id", b"x-sequence-id")):
    raise HTTPException(status_code=400, detail="Headers reserved for internal use")

```

### Model List File Permissions

The proxy writes the LiteLLM configuration to a temporary file using `tempfile.NamedTemporaryFile(delete=False)` without explicit permission restrictions. This file may be readable by other users on the same host, potentially exposing model endpoints or API keys.

**Mitigation:** Create the file with restrictive permissions (0600) before passing it to LiteLLM:

```python
import tempfile
import os
import json

fd, path = tempfile.mkstemp(prefix="llm_proxy_", suffix=".json")
os.write(fd, json.dumps(self.litellm_config).encode())
os.fchmod(fd, 0o600)  # Restrict to owner read/write only

os.close(fd)
self._config_file = path

```

## Input Validation and Injection Prevention

Without explicit guards, the proxy accepts arbitrary payloads that could enable denial-of-service or code execution attacks.

### Request Body Size Limits

`StreamConversionMiddleware` reads the entire request body using `await request.json()` at line 1266. Malformed JSON raises `json.JSONDecodeError`, but the lack of size limits allows attackers to submit multi-gigabyte payloads causing memory exhaustion.

**Mitigation:** Enforce a maximum request size before JSON parsing:

```python

# Patch inside StreamConversionMiddleware.dispatch (around line 1265)

MAX_BODY_BYTES = 2_000_000  # 2 MiB

if "content-length" in request.headers:
    if int(request.headers["content-length"]) > MAX_BODY_BYTES:
        raise HTTPException(
            status_code=413,
            detail=f"Request body too large (max {MAX_BODY_BYTES} bytes)",
        )

```

### Header Deserialization Risks

`LightningSpanExporter._maybe_flush` parses `metadata.requester_custom_headers` using `ast.literal_eval`. While `literal_eval` restricts execution to literals, extremely large strings or deeply nested structures could cause performance degradation. The code currently lacks length validation before parsing.

**Mitigation:** Add a size guard before `literal_eval` execution:

```python
if len(headers_str) > 10_000:
    raise ValueError("Header string exceeds maximum length")

# Then proceed with ast.literal_eval(headers_str)

```

### Path Rewriting Validation

`RolloutAttemptMiddleware` rewrites the ASGI `scope["path"]` based on a regex match. Without anchoring and character validation, crafted paths could cause unexpected routing or path traversal attempts.

**Mitigation:** Ensure the regex is anchored and validate path components match allowed patterns (`[a-zA-Z0-9_-]+`):

```python

# Validate rollout_id and attempt_id match expected patterns

if not re.match(r'^[a-zA-Z0-9_-]+$', rollout_id):
    raise HTTPException(status_code=400, detail="Invalid rollout ID format")

```

## Secret Management and Transport Security

### Environment Variable Exposure

The proxy forces `USE_OTEL_LITELLM_REQUEST_SPAN="true"` in `initialize()` (lines 822-825). If other sensitive environment variables (such as `OPENAI_API_KEY`) exist in the same process, they remain accessible to the proxy and any compromised middleware.

**Mitigation:** Clear sensitive variables after store instantiation or run the proxy in a separate container with a minimal environment:

```python

# After store initialization

os.environ.pop("OPENAI_API_KEY", None)
os.environ.pop("ANTHROPIC_API_KEY", None)

```

### Lack of TLS Enforcement

The proxy runs via Uvicorn without enforcing HTTPS, sending prompts and API keys in plaintext if exposed over HTTP.

**Mitigation:** Deploy behind a reverse proxy (Nginx, Traefik) that terminates TLS, or configure Uvicorn with SSL certificates:

```python
uvicorn.run(
    app,
    host="127.0.0.1",
    port=8000,
    ssl_keyfile="/path/to/key.pem",
    ssl_certfile="/path/to/cert.pem"
)

```

## Rate Limiting and Abuse Prevention

### Unlimited Request Rates

The proxy imposes no per-client or per-model rate limits, allowing attackers to flood backends and generate excessive costs.

**Mitigation:** Implement a custom middleware that tracks request counts per API key or IP address, returning HTTP 429 when thresholds are exceeded.

### Prompt Injection Vulnerabilities

The proxy forwards arbitrary prompt content without inspection, potentially allowing jailbreak attempts or disallowed content generation.

**Mitigation:** Register a LiteLLM `CustomLogger` callback that inspects `data["messages"]` before forwarding to the backend, using the `callbacks` list in `LLMProxy` initialization.

## Exporter and Tracing Security

### OTLP Endpoint Leakage

`LightningSpanExporter` rewrites its endpoint to `store.otlp_traces_endpoint()` (lines 1493-1495). If the store returns a misconfigured or attacker-controlled endpoint, traces containing sensitive prompts could leak externally.

**Mitigation:** Whitelist the endpoint URL before assignment:

```python
endpoint = store.otlp_traces_endpoint()
if not endpoint.startswith("https://trusted-collector/"):
    raise ValueError("Untrusted OTLP endpoint detected")

```

### Callback Duplication

The proxy resets LiteLLM's logging worker to avoid `RuntimeError` (lines 1010-1020), but this can unintentionally retain stale callbacks that leak data between requests.

**Mitigation:** Explicitly clear previously registered callbacks after resetting:

```python
litellm.callbacks.clear()  # Clear stale callbacks before adding new ones

# Then register new callbacks

```

## Deployment Hardening Recommendations

- **Process Isolation:** Run the proxy in a separate process from the main runner to avoid tracer conflicts and contain security breaches (as warned in lines 1026-1030 of [`llm_proxy.py`](https://github.com/microsoft/agent-lightning/blob/main/llm_proxy.py)).
- **Network Binding:** Bind only to localhost (`host="127.0.0.1"`) unless external access is explicitly required, and restrict source IPs using firewall rules.
- **Privilege Minimization:** Execute the proxy under a non-root user with read/write access limited to the temporary config file and store database only.

## Summary

- **Authentication:** The LLM Proxy requires custom FastAPI middleware (such as `AuthMiddleware`) to prevent unauthenticated access, as the base implementation in [`agentlightning/llm_proxy.py`](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/llm_proxy.py) includes no auth layer.
- **Header Security:** Validate that `x-rollout-id`, `x-attempt-id`, and `x-sequence-id` headers are not present in incoming requests before `RolloutAttemptMiddleware` processes them at lines 566-572.
- **File Permissions:** Create temporary configuration files with mode `0600` using `os.fchmod` to prevent credential leakage.
- **Input Validation:** Enforce request size limits (e.g., 2 MiB maximum) before `StreamConversionMiddleware` parses JSON at line 1266.
- **Secret Management:** Clear sensitive environment variables after store initialization and deploy behind TLS-terminating reverse proxies.
- **Tracing Security:** Whitelist `otlp_traces_endpoint()` URLs in `LightningSpanExporter` (lines 1493-1495) to prevent trace leakage.

## Frequently Asked Questions

### Does the agent-lightning LLM Proxy include authentication by default?

No. The `LLMProxy` class initializes a FastAPI app from `litellm.proxy.proxy_server` without any authentication middleware. You must implement custom middleware such as an `AuthMiddleware` class and pass it to the `middlewares` parameter in `LLMProxy.__init__` (lines 1110-1122) to secure the endpoint.

### How can I prevent header injection attacks in the RolloutAttemptMiddleware?

Add validation at the beginning of the `dispatch` method to check for the presence of `x-rollout-id`, `x-attempt-id`, or `x-sequence-id` headers. If any exist, raise an `HTTPException` with status code 400 before the middleware reaches the injection logic at lines 566-572.

### What file permissions should the temporary model configuration have?

The temporary config file created in `LLMProxy.start` should have `0600` (owner read/write only) permissions. Use `os.fchmod(fd, 0o600)` immediately after creating the file with `tempfile.mkstemp` to prevent other users on the host from reading potentially sensitive model configuration data.

### How do I enable HTTPS/TLS for the LLM Proxy?

The proxy runs via Uvicorn without built-in TLS enforcement. Either deploy behind a reverse proxy (Nginx, Traefik) that terminates TLS, or pass `ssl_keyfile` and `ssl_certfile` parameters to `uvicorn.run()`. Additionally, binding to `127.0.0.1` ensures the service is not exposed on public interfaces until TLS is properly configured.