# Headroom Proxy Server Architecture: How the FastAPI LLM Gateway Works and How to Extend It

> Explore the Headroom proxy server architecture built with FastAPI. Learn how the LLM gateway routes requests and discover methods to extend its functionality with custom components.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: architecture
- Published: 2026-06-09

---

**The Headroom proxy server is a modular FastAPI-based HTTP and WebSocket gateway that routes LLM requests through configurable provider handlers, a transform pipeline, and pluggable interceptors, and it can be extended by registering new components in [`headroom/proxy/extensions.py`](https://github.com/chopratejas/headroom/blob/main/headroom/proxy/extensions.py).**

The `chopratejas/headroom` repository implements a lightweight proxy layer that sits between **LLM clients** and upstream **model providers**. Understanding the Headroom proxy server architecture reveals how requests flow from FastAPI routes through **handler mix-ins** and **transforms** before reaching external APIs. The codebase is intentionally modular, making it straightforward to introduce new providers, custom payload mutations, and policy interceptors without modifying core routing logic.

## Core Components of the Headroom Proxy Server Architecture

At its heart, the system separates routing, transformation, and observability into discrete layers. Each layer is exposed through specific modules under `headroom/proxy/` and `headroom/transforms/`.

### FastAPI Entry Point and Configuration

The [`headroom/proxy/server.py`](https://github.com/chopratejas/headroom/blob/main/headroom/proxy/server.py) module builds the **FastAPI** application, parses **CLI flags**, instantiates the **ProxyConfig** object, and registers HTTP routes. It acts as the central dispatcher that injects configuration into the request lifecycle.

### Provider Handlers

The `headroom/proxy/handlers/` directory contains one module per LLM provider, such as [`headroom/proxy/handlers/openai.py`](https://github.com/chopratejas/headroom/blob/main/headroom/proxy/handlers/openai.py) and [`headroom/proxy/handlers/anthropic.py`](https://github.com/chopratejas/headroom/blob/main/headroom/proxy/handlers/anthropic.py). Each handler implements a mix-in—`OpenAIHandlerMixin`, `AnthropicHandlerMixin`, or similar—that translates Headroom’s internal request format into provider-specific API calls.

### Transform Pipeline

Before a request leaves the proxy and after the response returns, it passes through the pipeline defined in [`headroom/transforms/pipeline.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/pipeline.py). This module orchestrates objects like `smart_crusher` from [`headroom/transforms/smart_crusher.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/smart_crusher.py) that can compress, filter, or enrich payloads to reduce **token compression** overhead.

### Observability and Interceptors

The proxy exposes **Prometheus metrics** via [`headroom/proxy/prometheus_metrics.py`](https://github.com/chopratejas/headroom/blob/main/headroom/proxy/prometheus_metrics.py), which records per-transform compression statistics on a `/metrics` endpoint. Meanwhile, [`headroom/proxy/interceptors/astgrep.py`](https://github.com/chopratejas/headroom/blob/main/headroom/proxy/interceptors/astgrep.py) demonstrates how **AST-based interceptors** can rewrite prompts before they reach the transform pipeline.

## Extending the Headroom Proxy Server Architecture

Headroom uses a **dependency-injection** pattern where extensions receive the active `ProxyConfig` instance and are wired into the FastAPI lifecycle automatically. You can extend the system in three primary ways.

### Add a New LLM Provider Handler

Create a new file in `headroom/proxy/handlers/<provider>.py` and subclass **BaseHandlerMixin**. You must implement **prepare_request**, **call_provider**, and **postprocess_response**. Then register the handler in [`headroom/proxy/extensions.py`](https://github.com/chopratejas/headroom/blob/main/headroom/proxy/extensions.py) using `register_handler("<provider>", YourHandlerMixin)`. The generic `/v1/...` routes will discover the handler by name without requiring custom endpoint definitions.

### Create a Custom Transform

Transforms must follow the **Transform** protocol by implementing **apply_request** and/or **apply_response**. Place your module under `headroom/transforms/` and import it into [`headroom/transforms/pipeline.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/pipeline.py), or load it dynamically through a configuration flag. You can also expose runtime toggles by adding fields to `ProxyConfig` in [`headroom/proxy/config.py`](https://github.com/chopratejas/headroom/blob/main/headroom/proxy/config.py).

### Register a Request Interceptor

Interceptors run before the transform pipeline, making them ideal for policy enforcement or PII redaction. Subclass `headroom.proxy.interceptors.base.BaseInterceptor`, define an **intercept** method that receives and returns the request dict, and register it in [`headroom/proxy/extensions.py`](https://github.com/chopratejas/headroom/blob/main/headroom/proxy/extensions.py) via `register_interceptor()`.

## Practical Extension Examples

The following examples show how to add a provider, a transform, and an interceptor without touching core routing logic.

### Echo Provider Handler

This dummy handler echoes the request back as a response, which is useful for local testing:

```python

# headroom/proxy/handlers/dummy.py

from .base import BaseHandlerMixin

class DummyHandlerMixin(BaseHandlerMixin):
    """Echoes the incoming payload back unchanged – useful for testing."""
    async def call_provider(self, request_body: dict) -> dict:
        # No external network call – just return the request as the response.

        return {"choices": [{"message": request_body.get("messages", [{}])[0]}]}

# Register the handler

# headroom/proxy/extensions.py

from .handlers.dummy import DummyHandlerMixin
register_handler("dummy", DummyHandlerMixin)

```

After registration, sending a request with an `Authorization` header bearing the provider name `dummy` routes through the new handler.

### Uppercase Prompt Transform

This transform mutates every message content string to uppercase before forwarding it:

```python

# headroom/transforms/uppercase_prompt.py

from .base import Transform

class UppercasePrompt(Transform):
    def apply_request(self, payload: dict) -> dict:
        for msg in payload.get("messages", []):
            if "content" in msg:
                msg["content"] = msg["content"].upper()
        return payload

# Enable it in the pipeline

# headroom/transforms/pipeline.py

from .uppercase_prompt import UppercasePrompt
DEFAULT_TRANSFORMS = [
    UppercasePrompt(),
    # … existing transforms …

]

```

### Email Redaction Interceptor

This interceptor strips email addresses from the prompt field before any transform runs:

```python

# headroom/proxy/interceptors/redact_email.py

import re
from .base import BaseInterceptor

EMAIL_RE = re.compile(r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+")

class RedactEmailInterceptor(BaseInterceptor):
    def intercept(self, payload: dict) -> dict:
        text = payload.get("prompt", "")
        payload["prompt"] = EMAIL_RE.sub("[REDACTED]", text)
        return payload

# Register it

# headroom/proxy/extensions.py

from .interceptors.redact_email import RedactEmailInterceptor
register_interceptor(RedactEmailInterceptor())

```

Because interceptors execute before the transform pipeline, this guarantees that PII never reaches downstream providers.

## Summary

- The [`headroom/proxy/server.py`](https://github.com/chopratejas/headroom/blob/main/headroom/proxy/server.py) file bootstraps the FastAPI app and injects `ProxyConfig` into the request lifecycle.
- Provider handlers in `headroom/proxy/handlers/` translate internal requests to provider-specific APIs via mix-ins like `OpenAIHandlerMixin`.
- The transform pipeline in [`headroom/transforms/pipeline.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/pipeline.py) applies modular mutations such as `smart_crusher` to reduce token usage.
- Extensions are registered centrally in [`headroom/proxy/extensions.py`](https://github.com/chopratejas/headroom/blob/main/headroom/proxy/extensions.py) through `register_handler()` or `register_interceptor()`, keeping core routing logic unchanged.
- Interceptors execute before transforms, offering a hook for policy enforcement and request rewriting.

## Frequently Asked Questions

### What stack does the Headroom proxy server architecture use?

The Headroom proxy server architecture is built on **FastAPI**, running as an asynchronous HTTP and WebSocket gateway. It uses standard Python async patterns and dependency injection to wire configuration, handlers, and transforms into each request.

### How do I add a new LLM provider to Headroom without modifying core routes?

You subclass `BaseHandlerMixin` inside a new module under `headroom/proxy/handlers/`, implement `prepare_request`, `call_provider`, and `postprocess_response`, then publish it with `register_handler()` in [`headroom/proxy/extensions.py`](https://github.com/chopratejas/headroom/blob/main/headroom/proxy/extensions.py). The existing generic `/v1/...` routes discover the handler automatically by its registered name.

### What is the difference between a transform and an interceptor in Headroom?

An **interceptor** subclasses `BaseInterceptor` and runs before the transform pipeline to rewrite or audit the raw request dict, while a **transform** follows the `Transform` protocol and operates within [`headroom/transforms/pipeline.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/pipeline.py) to mutate payloads before they reach the provider or after the response returns.

### Where does Headroom expose metrics and telemetry?

The [`headroom/proxy/prometheus_metrics.py`](https://github.com/chopratejas/headroom/blob/main/headroom/proxy/prometheus_metrics.py) module exposes a `/metrics` endpoint and records per-transform compression statistics. Additional telemetry helpers in [`headroom/proxy/helpers.py`](https://github.com/chopratejas/headroom/blob/main/headroom/proxy/helpers.py) support Server-Sent Events parsing, streaming response handling, and rate-limiting utilities.