# What Is the Difference Between Audit, Optimize, and Simulate Modes in Headroom?

> Understand the distinct functions of Audit, Optimize, and Simulate modes in Headroom. Learn how to observe, transform, and plan LLM requests efficiently

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: deep-dive
- Published: 2026-06-05

---

**Headroom's Audit mode observes requests without modifying them, Optimize mode applies deterministic transforms to reduce tokens, and Simulate mode returns a transformation plan without calling the upstream LLM.**

Headroom is an open-source LLM proxy that optimizes context windows and reduces API costs through intelligent compression. The tool's runtime behavior is governed by the `HeadroomMode` enum defined in [`headroom/models/config.py`](https://github.com/chopratejas/headroom/blob/main/headroom/models/config.py), which determines whether the proxy merely observes traffic, actively transforms payloads, or runs a cost-estimation dry-run.

## Understanding the HeadroomMode Enum

The core architecture defines three mutually exclusive operating modes in [[`headroom/models/config.py`](https://github.com/chopratejas/headroom/blob/main/headroom/models/config.py)](https://github.com/chopratejas/headroom/blob/main/headroom/models/config.py):

```python
class HeadroomMode(str, Enum):
    AUDIT = "audit"       # Observe only, no modifications

    OPTIMIZE = "optimize" # Apply deterministic transforms

    SIMULATE = "simulate" # Return transform plan without API call

```

Each mode serves distinct operational requirements, from safe production monitoring to aggressive cost optimization.

## Audit Mode: Observe Without Modifying

**Audit mode** is the safety-first option for production monitoring. In this mode, Headroom inspects every request and records what transforms *would* have been applied, but the payload sent to the LLM remains completely unchanged.

This mode is ideal for baseline measurement and validating compression strategies before enabling them live. The proxy returns standard LLM responses augmented with `X-Headroom-*` headers containing the audit metadata.

### Using Audit Mode

Configure Audit mode at the SDK level or per-request:

```python
from headroom import HeadroomClient, OpenAIProvider
from openai import OpenAI

client = HeadroomClient(
    original_client=OpenAI(),
    provider=OpenAIProvider(),
    default_mode="audit",          # ← observe only

)

resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain quantum tunneling"}],
    headroom_mode="audit",        # optional per-request override

)

```

The request proceeds to the LLM unmodified while Headroom logs which transforms would have run.

## Optimize Mode: Apply Production Transforms

**Optimize mode** is the default for performance-focused deployments. When enabled, Headroom applies deterministic transforms such as `SmartCrusher`, `CacheAligner`, and `RollingWindow` to compress the request before it reaches the LLM.

According to the source code in `headroom/transforms/`, these transforms aggressively reduce token count by removing redundant context, aligning cache prefixes, and compressing large JSON payloads without altering semantic meaning.

### Using Optimize Mode

```python
client = HeadroomClient(
    original_client=OpenAI(),
    provider=OpenAIProvider(),
    default_mode="optimize",      # ← enable compression

)

resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Generate a large JSON report"}],
)

```

The response includes headers showing actual tokens saved. To disable optimization entirely via CLI, use `headroom proxy --no-optimize`, which effectively forces Audit behavior across all traffic.

## Simulate Mode: Dry-Run Cost Estimation

**Simulate mode** provides a complete dry-run of the compression pipeline without incurring LLM costs. Instead of calling the upstream API, Headroom returns a `Plan` object describing which transforms would execute and the estimated savings.

This mode is essential for CI pipelines, cost forecasting, and testing compression strategies against historical traffic. The `Plan` object contains `tokens_saved`, `transforms`, and `estimated_savings` fields.

### Using Simulate Mode

Invoke via the dedicated `simulate` method in [`headroom/client.py`](https://github.com/chopratejas/headroom/blob/main/headroom/client.py):

```python
plan = client.chat.completions.simulate(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Generate a large JSON report"}],
)

print(f"Would save {plan.tokens_saved} tokens")
print("Transforms that would run:", plan.transforms)
print("Estimated cost reduction:", plan.estimated_savings)

```

No network call to the LLM provider occurs during simulation.

## Configuring Modes at Three Levels

Headroom supports mode configuration at multiple granularity levels:

1. **SDK Construction**: Set `default_mode="audit"`, `"optimize"`, or `"simulate"` when instantiating `HeadroomClient`.

2. **Per-Request Override**: Pass `headroom_mode="audit"` (or any valid mode) into `client.chat.completions.create()` to override the default for a single call.

3. **Proxy Command-Line**: Launch the proxy with `headroom proxy --no-optimize` to disable optimization globally, forcing Audit behavior.

## Summary

- **Audit mode** observes traffic and reports potential savings without modifying requests, making it safe for production monitoring.
- **Optimize mode** actively applies transforms like `SmartCrusher` and `CacheAligner` to reduce token counts and latency in live environments.
- **Simulate mode** returns a `Plan` object with cost estimates and transform details without calling the upstream LLM, perfect for testing and CI.
- All three modes are defined in [`headroom/models/config.py`](https://github.com/chopratejas/headroom/blob/main/headroom/models/config.py) and can be configured via the SDK, per-request parameters, or CLI flags.

## Frequently Asked Questions

### What happens to my LLM response in Audit mode?

The response returns normally from the upstream LLM without modification. Headroom only injects `X-Headroom-*` headers indicating which transforms would have been applied and the estimated tokens that could have been saved.

### Can I switch modes for individual requests without changing the SDK default?

Yes. Pass the `headroom_mode` parameter directly to `client.chat.completions.create()` to override the client's default mode for that specific request. This allows you to audit specific high-risk requests while keeping optimization enabled by default.

### Which transforms run in Optimize mode?

The specific transforms depend on your configuration, but commonly include `SmartCrusher` for aggressive context compression, `CacheAligner` for prefix optimization, and `RollingWindow` for managing conversation history. These are implemented in the `headroom/transforms/` directory and applied deterministically before the request reaches the LLM.

### Does Simulate mode consume LLM API tokens?

No. Simulate mode performs all transformation logic locally and returns a `Plan` object without making any network call to the upstream LLM provider. This makes it ideal for cost estimation and integration testing without incurring API charges.