# Headroom Audit vs Optimize vs Simulate: Three Operating Modes Explained

> Understand Headroom operating modes: audit observes, optimize compresses, and simulate plans LLM calls. Learn how each mode enhances your requests.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: comparison
- Published: 2026-06-06

---

**Headroom’s `audit` mode observes requests without modifying payloads, `optimize` applies deterministic transforms to compress context, and `simulate` runs a dry-run that returns a transformation plan without calling the LLM.**

The Headroom proxy intercepts LLM requests to reduce token usage and latency. Its runtime behavior is controlled by the `HeadroomMode` enum defined in [[`headroom/models/config.py`](https://github.com/chopratejas/headroom/blob/main/headroom/models/config.py)](https://github.com/chopratejas/headroom/blob/main/headroom/models/config.py), which determines whether the system observes traffic, actively optimizes payloads, or simulates changes for cost estimation. Understanding these **three operating modes** is essential for deploying Headroom safely in production environments.

## Audit Mode: Observation Without Modification

In `audit` mode, the proxy observes every request and records what it would change, but **does not modify** the payload sent to the LLM. This mode is ideal for production monitoring, baseline measurement, and safety-first deployments where you need visibility into Headroom's behavior without affecting live traffic.

When running in `audit` mode, the request passes through to the LLM unchanged, but the proxy adds `X-Headroom-*` headers containing audit information about which transforms would have been applied.

```python
from headroom import HeadroomClient, OpenAIProvider
from openai import OpenAI

client = HeadroomClient(
    original_client=OpenAI(),
    provider=OpenAIProvider(),
    default_mode="audit",          # ← observe only

)

resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain quantum tunneling"}],
    headroom_mode="audit",        # override per‑request (optional)

)
print(resp)   # contains normal LLM output plus X‑Headroom‑* headers with audit info

```

## Optimize Mode: Live Compression and Transformation

The `optimize` mode applies safe, deterministic transforms to the request before it reaches the LLM. This is the default for performance-focused deployments and actively compresses context to reduce token costs and latency.

Transforms applied in this mode include **SmartCrusher**, **CacheAligner**, and **RollingWindow**, which compress large JSON payloads, align cache prefixes, and remove low-importance conversation turns. The modified payload is then sent to the LLM, with `X-Headroom-*` headers showing actual token savings.

```python
client = HeadroomClient(
    original_client=OpenAI(),
    provider=OpenAIProvider(),
    default_mode="optimize",      # ← enable compression

)

resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Generate a large JSON report"}],
)
print(resp)   # payload is compressed; X‑Headroom‑* headers show savings

```

## Simulate Mode: Dry-Run Cost Estimation

The `simulate` mode does not call the upstream LLM. Instead, it returns a **Plan** object describing which transforms would run and the estimated token savings. This mode is designed for testing, cost-estimation, CI pipelines, and any scenario requiring a dry-run without incurring LLM usage charges.

The `Plan` object contains `tokens_saved`, `transforms`, and `estimated_savings` properties, allowing you to preview optimization impact before enabling live mode.

```python
plan = client.chat.completions.simulate(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Generate a large JSON report"}],
)

print(f"Would save {plan.tokens_saved} tokens")
print("Transforms that would run:", plan.transforms)
print("Estimated cost reduction:", plan.estimated_savings)

```

## Configuring Headroom Modes

You can configure the operating mode at three levels according to the source code in [[`headroom/models/config.py`](https://github.com/chopratejas/headroom/blob/main/headroom/models/config.py)](https://github.com/chopratejas/headroom/blob/main/headroom/models/config.py):

1. **SDK Construction**: Set `default_mode="audit"`, `"optimize"`, or `"simulate"` when initializing `HeadroomClient` in [[`headroom/client.py`](https://github.com/chopratejas/headroom/blob/main/headroom/client.py)](https://github.com/chopratejas/headroom/blob/main/headroom/client.py).
2. **Per-Request Override**: Pass `headroom_mode="audit"` (or `optimize`/`simulate`) into `client.chat.completions.create()` to override the default for a single request.
3. **Proxy Command-Line**: Use `headroom proxy --no-optimize` to disable optimization entirely, effectively forcing `audit` mode at the infrastructure level.

## Implementation Details

The mode logic is implemented across several key files in the [`chopratejas/headroom`](https://github.com/chopratejas/headroom) repository:

- **[[`headroom/models/config.py`](https://github.com/chopratejas/headroom/blob/main/headroom/models/config.py)](https://github.com/chopratejas/headroom/blob/main/headroom/models/config.py)**: Defines the `HeadroomMode` enum with `AUDIT`, `OPTIMIZE`, and `SIMULATE` values.
- **[[`headroom/client.py`](https://github.com/chopratejas/headroom/blob/main/headroom/client.py)](https://github.com/chopratejas/headroom/blob/main/headroom/client.py)**: Implements the Python SDK, including mode switching in `chat.completions.create()` and the `simulate()` method.
- **[`headroom/transforms/`](https://github.com/chopratejas/headroom/tree/main/headroom/transforms)**: Contains individual transform implementations (e.g., [`smart_crusher.py`](https://github.com/chopratejas/headroom/blob/main/smart_crusher.py), [`cache_aligner.py`](https://github.com/chopratejas/headroom/blob/main/cache_aligner.py)) that execute only when `optimize` mode is active.

## Summary

- **Audit mode** observes traffic and logs potential changes without modifying requests, perfect for production monitoring.
- **Optimize mode** applies deterministic transforms like SmartCrusher and CacheAligner to reduce tokens and latency in live traffic.
- **Simulate mode** returns a `Plan` object with cost estimates without calling the LLM, ideal for CI testing and dry-runs.
- Configure modes via SDK constructor, per-request parameters, or proxy CLI flags in [[`headroom/models/config.py`](https://github.com/chopratejas/headroom/blob/main/headroom/models/config.py)](https://github.com/chopratejas/headroom/blob/main/headroom/models/config.py).

## Frequently Asked Questions

### Can I switch between audit and optimize mode without restarting the proxy?

Yes. You can override the default mode on a per-request basis by passing `headroom_mode="audit"` or `headroom_mode="optimize"` to `client.chat.completions.create()`. Alternatively, use the `headroom proxy --no-optimize` CLI flag to force audit mode across all traffic without code changes.

### What information does the simulate mode return?

Simulate mode returns a `Plan` object containing `tokens_saved`, `transforms`, and `estimated_savings` properties. This object details exactly which transforms (such as SmartCrusher or CacheAligner) would execute and quantifies the expected token reduction without making an actual LLM API call.

### Does audit mode impact latency?

Audit mode adds minimal latency because it only inspects requests and adds headers without performing compute-intensive transforms. However, it does not provide the token savings or latency reduction benefits of `optimize` mode, which actively compresses payloads before transmission to the LLM.

### Which transforms run in optimize mode?

The `optimize` mode executes deterministic transforms located in [`headroom/transforms/`](https://github.com/chopratejas/headroom/tree/main/headroom/transforms), including SmartCrusher for JSON compression, CacheAligner for prefix optimization, and RollingWindow for conversation history management. These transforms modify the request payload before it reaches the LLM.