# How to Wrap AI Coding Agents with Headroom for Automatic Compression

> Learn how to wrap AI coding agents with Headroom for automatic compression. This command-line tool intercepts traffic and optimizes it for LLM requests.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: how-to-guide
- Published: 2026-06-03

---

**Headroom provides a single-command wrapper (`headroom wrap <agent>`) that launches a transparent MCP proxy to intercept HTTP/WebSocket traffic from AI coding agents and automatically applies AST-aware compression, token reduction, and tag protection before forwarding requests to upstream LLMs.**

The `chopratejas/headroom` open-source project enables developers to wrap AI coding agents with Headroom for automatic compression, significantly reducing token costs and API latency without modifying agent source code. The wrapper supports Claude, Codex, Copilot, Aider, Cursor, and OpenClaw through a language-agnostic proxy architecture.

## How the Wrap Proxy Intercepts Agent Traffic

When you execute `headroom wrap <agent>`, the CLI performs three core operations defined in [`headroom/cli/wrap.py`](https://github.com/chopratejas/headroom/blob/main/headroom/cli/wrap.py) (lines 2092–2610):

- **Proxy initialization**: The wrapper starts a Headroom MCP proxy on the default port (9999) or reuses an existing persistent deployment. The implementation handles the `--skip-mcp-register` flag and invokes `headroom_retrieve` on compression markers to manage proxy state.

- **Compression pipeline activation**: Every intercepted request feeds into Headroom’s transform pipeline. The [`headroom/transforms/compression_policy.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/compression_policy.py) module determines which transforms are active for the session, orchestrating [`headroom/transforms/code_compressor.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/code_compressor.py) for AST-based reduction and [`headroom/transforms/tag_protector.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/tag_protector.py) for placeholder safeguarding.

- **Environment injection**: The wrapper sets `HEADROOM_STACK=wrap_<agent>` and `HEADROOM_CONTEXT_TOOL` so downstream components recognize the wrapped context. Telemetry helpers like `detect_stack` and `_print_telemetry_notice` (referenced in [`test_telemetry_context.py`](https://github.com/chopratejas/headroom/blob/main/test_telemetry_context.py) lines 27–71) automatically emit compression metrics.

## End-to-End Request Flow

The complete lifecycle of a wrapped request follows this execution path:

1. **Command parsing**: `headroom wrap` identifies the agent type (`wrap_claude`, `wrap_codex`, etc.) and separates wrapper flags from tool arguments using the double-dash (`--`) delimiter.

2. **Proxy selection**: If a proxy already runs on the requested port, the wrapper reuses it; otherwise, it spawns a fresh instance via `headroom proxy`.

3. **Request interception**: The proxy handler in [`headroom/cli/proxy.py`](https://github.com/chopratejas/headroom/blob/main/headroom/cli/proxy.py) unwraps tool-specific envelopes (e.g., Codex’s `response.create` wrapper) and routes the payload to the compression stack.

4. **Transform execution**: The payload traverses the ordered transforms from [`compression_policy.py`](https://github.com/chopratejas/headroom/blob/main/compression_policy.py). Each transform may inject compression markers, rewrite AST nodes, or strip redundant tokens—yielding up to 30% savings on large code diffs when code-aware compression is enabled.

5. **Upstream forwarding**: After compression, the modified request travels to the LLM endpoint. Responses optionally undergo reverse compression (e.g., boilerplate removal) before returning to the agent.

6. **Metrics persistence**: The system updates `ToolPattern.total_compressions` counters, exposing statistics via `headroom toin-publish` and the observability UI ([`headroom/cli/observability.py`](https://github.com/chopratejas/headroom/blob/main/headroom/cli/observability.py)).

## Wrapping Claude Code

To wrap Claude Code with automatic compression, run:

```bash
headroom wrap claude

```

This command sets `HEADROOM_STACK=wrap_claude`, initializes the MCP proxy on port 9999, and launches the Claude binary with `--proxy http://127.0.0.1:9999`. The agent operates normally while Headroom compresses all upstream traffic transparently.

## Using Custom LLM Backends

You can redirect wrapped agents to alternative providers. For example, to wrap GitHub Copilot with a Groq backend:

```bash
headroom wrap copilot \
    --backend anyllm \
    --anyllm-provider groq \
    -- --model gpt-4o

```

The `--backend anyllm` flag translates Copilot’s native calls to OpenAI-compatible `/v1` endpoints. Arguments after `--` pass directly to the agent (e.g., `--model gpt-4o`).

## Enabling AST-Based Code Compression

For projects with substantial codebases, enable AST-aware compression to maximize token savings:

```bash
pip install "headroom-ai[code]"
headroom wrap claude --code-graph

```

The optional `code_compressor` transform (activated via `--code-graph` in [`compression_policy.py`](https://github.com/chopratejas/headroom/blob/main/compression_policy.py)) parses Abstract Syntax Trees to eliminate redundant whitespace and comments while preserving semantic structure. The [`tag_protector.py`](https://github.com/chopratejas/headroom/blob/main/tag_protector.py) module safeguards special placeholders during this process to prevent tool-specific markers from corruption.

## Sharing Memory Across Agents

Headroom supports persistent, cross-agent memory through the MemoryWrapper ([`headroom/memory/wrapper.py`](https://github.com/chopratejas/headroom/blob/main/headroom/memory/wrapper.py)):

```bash
headroom wrap claude --memory
headroom wrap codex --memory

```

Both commands connect to the same hierarchical memory store, allowing context reuse between Claude Code and Codex without data duplication.

## Inspecting Compression Telemetry

After a session, view detailed compression statistics:

```bash
headroom toin-publish --min-observations 1

```

This displays a table of `ToolPattern.total_compressions` per transform stage, quantifying token savings achieved by the automatic compression pipeline.

## Summary

- **Single-command wrapping**: Use `headroom wrap <agent>` to instantiate a transparent MCP proxy that intercepts all agent traffic.
- **Automatic compression**: The transform pipeline in [`headroom/transforms/compression_policy.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/compression_policy.py) processes every request through AST-aware compression and tag protection.
- **Zero code changes**: The wrapper is language-agnostic and requires no modifications to Claude, Codex, Copilot, Aider, Cursor, or OpenClaw.
- **Optional enhancements**: Install `headroom-ai[code]` for code-graph compression or use `--memory` for cross-agent context sharing.
- **Built-in observability**: Compression metrics automatically populate via `ToolPattern.total_compressions` and export through `headroom toin-publish`.

## Frequently Asked Questions

### What AI coding agents are supported by Headroom wrap?

Headroom supports any agent communicating over HTTP or WebSocket, including Claude Code, OpenAI Codex, GitHub Copilot, Aider, Cursor, and OpenClaw. The [`headroom/cli/wrap.py`](https://github.com/chopratejas/headroom/blob/main/headroom/cli/wrap.py) module implements specific handlers for each tool (e.g., `wrap_claude`, `wrap_codex`) while maintaining a language-agnostic proxy core that intercepts requests via [`headroom/cli/proxy.py`](https://github.com/chopratejas/headroom/blob/main/headroom/cli/proxy.py).

### How does Headroom compression reduce token usage without breaking functionality?

The system applies **AST-aware transforms** via [`headroom/transforms/code_compressor.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/code_compressor.py) to remove syntactic noise (whitespace, comments) while preserving code semantics. The [`tag_protector.py`](https://github.com/chopratejas/headroom/blob/main/tag_protector.py) module safeguards tool-specific placeholders, and the [`compression_policy.py`](https://github.com/chopratejas/headroom/blob/main/compression_policy.py) orchestrator ensures transforms execute in the correct order. This architecture achieves up to 30% token reduction on diffs without altering functional behavior.

### Can I use Headroom wrap with custom LLM backends?

Yes. The `--backend anyllm` flag redirects agent traffic to any OpenAI-compatible endpoint. You can specify providers like Groq, Together, or private deployments using `--anyllm-provider <name>`. The proxy in [`headroom/cli/proxy.py`](https://github.com/chopratejas/headroom/blob/main/headroom/cli/proxy.py) handles protocol translation automatically, requiring no changes to the agent’s configuration beyond the initial wrap command.

### How do I monitor compression savings when using headroom wrap?

Compression events automatically increment `ToolPattern.total_compressions` counters stored in the tool pattern registry. Run `headroom toin-publish --min-observations 1` to view per-transform statistics, or inspect the observability UI via [`headroom/cli/observability.py`](https://github.com/chopratejas/headroom/blob/main/headroom/cli/observability.py). The wrapper also emits telemetry notices through `_print_telemetry_notice` when environment variables like `HEADROOM_STACK` are detected.