Security Considerations for the LLM Proxy in Agent-Lightning: Risks and Mitigations

The LLM Proxy in Microsoft's agent-lightning framework ships without authentication, input validation, or rate limiting by default, requiring custom FastAPI middleware and deployment hardening to secure the OpenAI-compatible API endpoint.

The LLM Proxy (agentlightning/llm_proxy.py) transforms Lightning stores into OpenAI-compatible REST APIs using FastAPI, making it a critical attack surface for production deployments. Because the proxy sits between untrusted clients and expensive LLM backends, understanding these security considerations is essential to prevent unauthorized access, data leakage, and prompt injection attacks. Below are the specific risks identified in the source code and practical mitigations you can implement today.

Authentication and Authorization Risks

The proxy initializes a FastAPI application from litellm.proxy.proxy_server without any authentication middleware, creating immediate exposure to unauthorized access.

Unauthenticated Access to LLM Endpoints

The LLMProxy class starts the API server in initialize() (lines 822-825) without enforcing API keys or tokens. Anyone with network access to the host and port can issue OpenAI-style requests, potentially exhausting quota or accessing sensitive model data.

Mitigation: Add a custom FastAPI authentication middleware as the first item in the middleware chain:

from fastapi import Request, HTTPException
from starlette.middleware.base import BaseHTTPMiddleware

API_KEYS = {"secret-key-1", "secret-key-2"}

class AuthMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next):
        auth = request.headers.get("authorization")
        if not auth or auth.split("Bearer ")[-1] not in API_KEYS:
            raise HTTPException(status_code=401, detail="Invalid API key")
        return await call_next(request)

Register it via the middlewares parameter in LLMProxy.__init__ (lines 1110-1122):

from agentlightning.llm_proxy import LLMProxy

proxy = LLMProxy(
    middlewares=["rollout_attempt", "stream_conversion", AuthMiddleware],
    callbacks=["return_token_ids", "opentelemetry"],
)

Header Injection via RolloutAttemptMiddleware

RolloutAttemptMiddleware injects x-rollout-id, x-attempt-id, and x-sequence-id headers into the request scope at lines 566-572. Because the middleware does not validate existing headers, malicious clients could spoof these internal tracking identifiers to manipulate attribution or bypass routing logic.

Mitigation: Add validation before the injection logic in agentlightning/llm_proxy.py:


# Inside RolloutAttemptMiddleware.dispatch (around line 545)

if any(h in request.headers for h in (b"x-rollout-id", b"x-attempt-id", b"x-sequence-id")):
    raise HTTPException(status_code=400, detail="Headers reserved for internal use")

Model List File Permissions

The proxy writes the LiteLLM configuration to a temporary file using tempfile.NamedTemporaryFile(delete=False) without explicit permission restrictions. This file may be readable by other users on the same host, potentially exposing model endpoints or API keys.

Mitigation: Create the file with restrictive permissions (0600) before passing it to LiteLLM:

import tempfile
import os
import json

fd, path = tempfile.mkstemp(prefix="llm_proxy_", suffix=".json")
os.write(fd, json.dumps(self.litellm_config).encode())
os.fchmod(fd, 0o600)  # Restrict to owner read/write only

os.close(fd)
self._config_file = path

Input Validation and Injection Prevention

Without explicit guards, the proxy accepts arbitrary payloads that could enable denial-of-service or code execution attacks.

Request Body Size Limits

StreamConversionMiddleware reads the entire request body using await request.json() at line 1266. Malformed JSON raises json.JSONDecodeError, but the lack of size limits allows attackers to submit multi-gigabyte payloads causing memory exhaustion.

Mitigation: Enforce a maximum request size before JSON parsing:


# Patch inside StreamConversionMiddleware.dispatch (around line 1265)

MAX_BODY_BYTES = 2_000_000  # 2 MiB

if "content-length" in request.headers:
    if int(request.headers["content-length"]) > MAX_BODY_BYTES:
        raise HTTPException(
            status_code=413,
            detail=f"Request body too large (max {MAX_BODY_BYTES} bytes)",
        )

Header Deserialization Risks

LightningSpanExporter._maybe_flush parses metadata.requester_custom_headers using ast.literal_eval. While literal_eval restricts execution to literals, extremely large strings or deeply nested structures could cause performance degradation. The code currently lacks length validation before parsing.

Mitigation: Add a size guard before literal_eval execution:

if len(headers_str) > 10_000:
    raise ValueError("Header string exceeds maximum length")

# Then proceed with ast.literal_eval(headers_str)

Path Rewriting Validation

RolloutAttemptMiddleware rewrites the ASGI scope["path"] based on a regex match. Without anchoring and character validation, crafted paths could cause unexpected routing or path traversal attempts.

Mitigation: Ensure the regex is anchored and validate path components match allowed patterns ([a-zA-Z0-9_-]+):


# Validate rollout_id and attempt_id match expected patterns

if not re.match(r'^[a-zA-Z0-9_-]+$', rollout_id):
    raise HTTPException(status_code=400, detail="Invalid rollout ID format")

Secret Management and Transport Security

Environment Variable Exposure

The proxy forces USE_OTEL_LITELLM_REQUEST_SPAN="true" in initialize() (lines 822-825). If other sensitive environment variables (such as OPENAI_API_KEY) exist in the same process, they remain accessible to the proxy and any compromised middleware.

Mitigation: Clear sensitive variables after store instantiation or run the proxy in a separate container with a minimal environment:


# After store initialization

os.environ.pop("OPENAI_API_KEY", None)
os.environ.pop("ANTHROPIC_API_KEY", None)

Lack of TLS Enforcement

The proxy runs via Uvicorn without enforcing HTTPS, sending prompts and API keys in plaintext if exposed over HTTP.

Mitigation: Deploy behind a reverse proxy (Nginx, Traefik) that terminates TLS, or configure Uvicorn with SSL certificates:

uvicorn.run(
    app,
    host="127.0.0.1",
    port=8000,
    ssl_keyfile="/path/to/key.pem",
    ssl_certfile="/path/to/cert.pem"
)

Rate Limiting and Abuse Prevention

Unlimited Request Rates

The proxy imposes no per-client or per-model rate limits, allowing attackers to flood backends and generate excessive costs.

Mitigation: Implement a custom middleware that tracks request counts per API key or IP address, returning HTTP 429 when thresholds are exceeded.

Prompt Injection Vulnerabilities

The proxy forwards arbitrary prompt content without inspection, potentially allowing jailbreak attempts or disallowed content generation.

Mitigation: Register a LiteLLM CustomLogger callback that inspects data["messages"] before forwarding to the backend, using the callbacks list in LLMProxy initialization.

Exporter and Tracing Security

OTLP Endpoint Leakage

LightningSpanExporter rewrites its endpoint to store.otlp_traces_endpoint() (lines 1493-1495). If the store returns a misconfigured or attacker-controlled endpoint, traces containing sensitive prompts could leak externally.

Mitigation: Whitelist the endpoint URL before assignment:

endpoint = store.otlp_traces_endpoint()
if not endpoint.startswith("https://trusted-collector/"):
    raise ValueError("Untrusted OTLP endpoint detected")

Callback Duplication

The proxy resets LiteLLM's logging worker to avoid RuntimeError (lines 1010-1020), but this can unintentionally retain stale callbacks that leak data between requests.

Mitigation: Explicitly clear previously registered callbacks after resetting:

litellm.callbacks.clear()  # Clear stale callbacks before adding new ones

# Then register new callbacks

Deployment Hardening Recommendations

  • Process Isolation: Run the proxy in a separate process from the main runner to avoid tracer conflicts and contain security breaches (as warned in lines 1026-1030 of llm_proxy.py).
  • Network Binding: Bind only to localhost (host="127.0.0.1") unless external access is explicitly required, and restrict source IPs using firewall rules.
  • Privilege Minimization: Execute the proxy under a non-root user with read/write access limited to the temporary config file and store database only.

Summary

  • Authentication: The LLM Proxy requires custom FastAPI middleware (such as AuthMiddleware) to prevent unauthenticated access, as the base implementation in agentlightning/llm_proxy.py includes no auth layer.
  • Header Security: Validate that x-rollout-id, x-attempt-id, and x-sequence-id headers are not present in incoming requests before RolloutAttemptMiddleware processes them at lines 566-572.
  • File Permissions: Create temporary configuration files with mode 0600 using os.fchmod to prevent credential leakage.
  • Input Validation: Enforce request size limits (e.g., 2 MiB maximum) before StreamConversionMiddleware parses JSON at line 1266.
  • Secret Management: Clear sensitive environment variables after store initialization and deploy behind TLS-terminating reverse proxies.
  • Tracing Security: Whitelist otlp_traces_endpoint() URLs in LightningSpanExporter (lines 1493-1495) to prevent trace leakage.

Frequently Asked Questions

Does the agent-lightning LLM Proxy include authentication by default?

No. The LLMProxy class initializes a FastAPI app from litellm.proxy.proxy_server without any authentication middleware. You must implement custom middleware such as an AuthMiddleware class and pass it to the middlewares parameter in LLMProxy.__init__ (lines 1110-1122) to secure the endpoint.

How can I prevent header injection attacks in the RolloutAttemptMiddleware?

Add validation at the beginning of the dispatch method to check for the presence of x-rollout-id, x-attempt-id, or x-sequence-id headers. If any exist, raise an HTTPException with status code 400 before the middleware reaches the injection logic at lines 566-572.

What file permissions should the temporary model configuration have?

The temporary config file created in LLMProxy.start should have 0600 (owner read/write only) permissions. Use os.fchmod(fd, 0o600) immediately after creating the file with tempfile.mkstemp to prevent other users on the host from reading potentially sensitive model configuration data.

How do I enable HTTPS/TLS for the LLM Proxy?

The proxy runs via Uvicorn without built-in TLS enforcement. Either deploy behind a reverse proxy (Nginx, Traefik) that terminates TLS, or pass ssl_keyfile and ssl_certfile parameters to uvicorn.run(). Additionally, binding to 127.0.0.1 ensures the service is not exposed on public interfaces until TLS is properly configured.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →