architecture

Headroom Proxy Server Architecture: How the FastAPI LLM Gateway Works and How to Extend It

June 9, 2026 chopratejas/headroom ↗

The Headroom proxy server is a modular FastAPI-based HTTP and WebSocket gateway that routes LLM requests through configurable provider handlers, a transform pipeline, and pluggable interceptors, and it can be extended by registering new components in headroom/proxy/extensions.py.

The chopratejas/headroom repository implements a lightweight proxy layer that sits between LLM clients and upstream model providers. Understanding the Headroom proxy server architecture reveals how requests flow from FastAPI routes through handler mix-ins and transforms before reaching external APIs. The codebase is intentionally modular, making it straightforward to introduce new providers, custom payload mutations, and policy interceptors without modifying core routing logic.

Core Components of the Headroom Proxy Server Architecture

At its heart, the system separates routing, transformation, and observability into discrete layers. Each layer is exposed through specific modules under headroom/proxy/ and headroom/transforms/.

FastAPI Entry Point and Configuration

The headroom/proxy/server.py module builds the FastAPI application, parses CLI flags, instantiates the ProxyConfig object, and registers HTTP routes. It acts as the central dispatcher that injects configuration into the request lifecycle.

Provider Handlers

The headroom/proxy/handlers/ directory contains one module per LLM provider, such as headroom/proxy/handlers/openai.py and headroom/proxy/handlers/anthropic.py. Each handler implements a mix-in—OpenAIHandlerMixin, AnthropicHandlerMixin, or similar—that translates Headroom’s internal request format into provider-specific API calls.

Transform Pipeline

Before a request leaves the proxy and after the response returns, it passes through the pipeline defined in headroom/transforms/pipeline.py. This module orchestrates objects like smart_crusher from headroom/transforms/smart_crusher.py that can compress, filter, or enrich payloads to reduce token compression overhead.

Observability and Interceptors

The proxy exposes Prometheus metrics via headroom/proxy/prometheus_metrics.py, which records per-transform compression statistics on a /metrics endpoint. Meanwhile, headroom/proxy/interceptors/astgrep.py demonstrates how AST-based interceptors can rewrite prompts before they reach the transform pipeline.

Extending the Headroom Proxy Server Architecture

Headroom uses a dependency-injection pattern where extensions receive the active ProxyConfig instance and are wired into the FastAPI lifecycle automatically. You can extend the system in three primary ways.

Add a New LLM Provider Handler

Create a new file in headroom/proxy/handlers/<provider>.py and subclass BaseHandlerMixin. You must implement prepare_request, call_provider, and postprocess_response. Then register the handler in headroom/proxy/extensions.py using register_handler("<provider>", YourHandlerMixin). The generic /v1/... routes will discover the handler by name without requiring custom endpoint definitions.

Create a Custom Transform

Transforms must follow the Transform protocol by implementing apply_request and/or apply_response. Place your module under headroom/transforms/ and import it into headroom/transforms/pipeline.py, or load it dynamically through a configuration flag. You can also expose runtime toggles by adding fields to ProxyConfig in headroom/proxy/config.py.

Register a Request Interceptor

Interceptors run before the transform pipeline, making them ideal for policy enforcement or PII redaction. Subclass headroom.proxy.interceptors.base.BaseInterceptor, define an intercept method that receives and returns the request dict, and register it in headroom/proxy/extensions.py via register_interceptor().

Practical Extension Examples

The following examples show how to add a provider, a transform, and an interceptor without touching core routing logic.

Echo Provider Handler

This dummy handler echoes the request back as a response, which is useful for local testing:


# headroom/proxy/handlers/dummy.py

from .base import BaseHandlerMixin

class DummyHandlerMixin(BaseHandlerMixin):
    """Echoes the incoming payload back unchanged – useful for testing."""
    async def call_provider(self, request_body: dict) -> dict:
        # No external network call – just return the request as the response.

        return {"choices": [{"message": request_body.get("messages", [{}])[0]}]}

# Register the handler

# headroom/proxy/extensions.py

from .handlers.dummy import DummyHandlerMixin
register_handler("dummy", DummyHandlerMixin)

After registration, sending a request with an Authorization header bearing the provider name dummy routes through the new handler.

Uppercase Prompt Transform

This transform mutates every message content string to uppercase before forwarding it:


# headroom/transforms/uppercase_prompt.py

from .base import Transform

class UppercasePrompt(Transform):
    def apply_request(self, payload: dict) -> dict:
        for msg in payload.get("messages", []):
            if "content" in msg:
                msg["content"] = msg["content"].upper()
        return payload

# Enable it in the pipeline

# headroom/transforms/pipeline.py

from .uppercase_prompt import UppercasePrompt
DEFAULT_TRANSFORMS = [
    UppercasePrompt(),
    # … existing transforms …

]

Email Redaction Interceptor

This interceptor strips email addresses from the prompt field before any transform runs:


# headroom/proxy/interceptors/redact_email.py

import re
from .base import BaseInterceptor

EMAIL_RE = re.compile(r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+")

class RedactEmailInterceptor(BaseInterceptor):
    def intercept(self, payload: dict) -> dict:
        text = payload.get("prompt", "")
        payload["prompt"] = EMAIL_RE.sub("[REDACTED]", text)
        return payload

# Register it

# headroom/proxy/extensions.py

from .interceptors.redact_email import RedactEmailInterceptor
register_interceptor(RedactEmailInterceptor())

Because interceptors execute before the transform pipeline, this guarantees that PII never reaches downstream providers.

Summary

The headroom/proxy/server.py file bootstraps the FastAPI app and injects ProxyConfig into the request lifecycle.
Provider handlers in headroom/proxy/handlers/ translate internal requests to provider-specific APIs via mix-ins like OpenAIHandlerMixin.
The transform pipeline in headroom/transforms/pipeline.py applies modular mutations such as smart_crusher to reduce token usage.
Extensions are registered centrally in headroom/proxy/extensions.py through register_handler() or register_interceptor(), keeping core routing logic unchanged.
Interceptors execute before transforms, offering a hook for policy enforcement and request rewriting.

Frequently Asked Questions

What stack does the Headroom proxy server architecture use?

The Headroom proxy server architecture is built on FastAPI, running as an asynchronous HTTP and WebSocket gateway. It uses standard Python async patterns and dependency injection to wire configuration, handlers, and transforms into each request.

How do I add a new LLM provider to Headroom without modifying core routes?

You subclass BaseHandlerMixin inside a new module under headroom/proxy/handlers/, implement prepare_request, call_provider, and postprocess_response, then publish it with register_handler() in headroom/proxy/extensions.py. The existing generic /v1/... routes discover the handler automatically by its registered name.

What is the difference between a transform and an interceptor in Headroom?

An interceptor subclasses BaseInterceptor and runs before the transform pipeline to rewrite or audit the raw request dict, while a transform follows the Transform protocol and operates within headroom/transforms/pipeline.py to mutate payloads before they reach the provider or after the response returns.

Where does Headroom expose metrics and telemetry?

The headroom/proxy/prometheus_metrics.py module exposes a /metrics endpoint and records per-transform compression statistics. Additional telemetry helpers in headroom/proxy/helpers.py support Server-Sent Events parsing, streaming response handling, and rate-limiting utilities.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how chopratejas/headroom works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →