How to Wrap AI Coding Agents with Headroom for Automatic Compression

Headroom provides a single-command wrapper (headroom wrap <agent>) that launches a transparent MCP proxy to intercept HTTP/WebSocket traffic from AI coding agents and automatically applies AST-aware compression, token reduction, and tag protection before forwarding requests to upstream LLMs.

The chopratejas/headroom open-source project enables developers to wrap AI coding agents with Headroom for automatic compression, significantly reducing token costs and API latency without modifying agent source code. The wrapper supports Claude, Codex, Copilot, Aider, Cursor, and OpenClaw through a language-agnostic proxy architecture.

How the Wrap Proxy Intercepts Agent Traffic

When you execute headroom wrap <agent>, the CLI performs three core operations defined in headroom/cli/wrap.py (lines 2092–2610):

  • Proxy initialization: The wrapper starts a Headroom MCP proxy on the default port (9999) or reuses an existing persistent deployment. The implementation handles the --skip-mcp-register flag and invokes headroom_retrieve on compression markers to manage proxy state.

  • Compression pipeline activation: Every intercepted request feeds into Headroom’s transform pipeline. The headroom/transforms/compression_policy.py module determines which transforms are active for the session, orchestrating headroom/transforms/code_compressor.py for AST-based reduction and headroom/transforms/tag_protector.py for placeholder safeguarding.

  • Environment injection: The wrapper sets HEADROOM_STACK=wrap_<agent> and HEADROOM_CONTEXT_TOOL so downstream components recognize the wrapped context. Telemetry helpers like detect_stack and _print_telemetry_notice (referenced in test_telemetry_context.py lines 27–71) automatically emit compression metrics.

End-to-End Request Flow

The complete lifecycle of a wrapped request follows this execution path:

  1. Command parsing: headroom wrap identifies the agent type (wrap_claude, wrap_codex, etc.) and separates wrapper flags from tool arguments using the double-dash (--) delimiter.

  2. Proxy selection: If a proxy already runs on the requested port, the wrapper reuses it; otherwise, it spawns a fresh instance via headroom proxy.

  3. Request interception: The proxy handler in headroom/cli/proxy.py unwraps tool-specific envelopes (e.g., Codex’s response.create wrapper) and routes the payload to the compression stack.

  4. Transform execution: The payload traverses the ordered transforms from compression_policy.py. Each transform may inject compression markers, rewrite AST nodes, or strip redundant tokens—yielding up to 30% savings on large code diffs when code-aware compression is enabled.

  5. Upstream forwarding: After compression, the modified request travels to the LLM endpoint. Responses optionally undergo reverse compression (e.g., boilerplate removal) before returning to the agent.

  6. Metrics persistence: The system updates ToolPattern.total_compressions counters, exposing statistics via headroom toin-publish and the observability UI (headroom/cli/observability.py).

Wrapping Claude Code

To wrap Claude Code with automatic compression, run:

headroom wrap claude

This command sets HEADROOM_STACK=wrap_claude, initializes the MCP proxy on port 9999, and launches the Claude binary with --proxy http://127.0.0.1:9999. The agent operates normally while Headroom compresses all upstream traffic transparently.

Using Custom LLM Backends

You can redirect wrapped agents to alternative providers. For example, to wrap GitHub Copilot with a Groq backend:

headroom wrap copilot \
    --backend anyllm \
    --anyllm-provider groq \
    -- --model gpt-4o

The --backend anyllm flag translates Copilot’s native calls to OpenAI-compatible /v1 endpoints. Arguments after -- pass directly to the agent (e.g., --model gpt-4o).

Enabling AST-Based Code Compression

For projects with substantial codebases, enable AST-aware compression to maximize token savings:

pip install "headroom-ai[code]"
headroom wrap claude --code-graph

The optional code_compressor transform (activated via --code-graph in compression_policy.py) parses Abstract Syntax Trees to eliminate redundant whitespace and comments while preserving semantic structure. The tag_protector.py module safeguards special placeholders during this process to prevent tool-specific markers from corruption.

Sharing Memory Across Agents

Headroom supports persistent, cross-agent memory through the MemoryWrapper (headroom/memory/wrapper.py):

headroom wrap claude --memory
headroom wrap codex --memory

Both commands connect to the same hierarchical memory store, allowing context reuse between Claude Code and Codex without data duplication.

Inspecting Compression Telemetry

After a session, view detailed compression statistics:

headroom toin-publish --min-observations 1

This displays a table of ToolPattern.total_compressions per transform stage, quantifying token savings achieved by the automatic compression pipeline.

Summary

  • Single-command wrapping: Use headroom wrap <agent> to instantiate a transparent MCP proxy that intercepts all agent traffic.
  • Automatic compression: The transform pipeline in headroom/transforms/compression_policy.py processes every request through AST-aware compression and tag protection.
  • Zero code changes: The wrapper is language-agnostic and requires no modifications to Claude, Codex, Copilot, Aider, Cursor, or OpenClaw.
  • Optional enhancements: Install headroom-ai[code] for code-graph compression or use --memory for cross-agent context sharing.
  • Built-in observability: Compression metrics automatically populate via ToolPattern.total_compressions and export through headroom toin-publish.

Frequently Asked Questions

What AI coding agents are supported by Headroom wrap?

Headroom supports any agent communicating over HTTP or WebSocket, including Claude Code, OpenAI Codex, GitHub Copilot, Aider, Cursor, and OpenClaw. The headroom/cli/wrap.py module implements specific handlers for each tool (e.g., wrap_claude, wrap_codex) while maintaining a language-agnostic proxy core that intercepts requests via headroom/cli/proxy.py.

How does Headroom compression reduce token usage without breaking functionality?

The system applies AST-aware transforms via headroom/transforms/code_compressor.py to remove syntactic noise (whitespace, comments) while preserving code semantics. The tag_protector.py module safeguards tool-specific placeholders, and the compression_policy.py orchestrator ensures transforms execute in the correct order. This architecture achieves up to 30% token reduction on diffs without altering functional behavior.

Can I use Headroom wrap with custom LLM backends?

Yes. The --backend anyllm flag redirects agent traffic to any OpenAI-compatible endpoint. You can specify providers like Groq, Together, or private deployments using --anyllm-provider <name>. The proxy in headroom/cli/proxy.py handles protocol translation automatically, requiring no changes to the agent’s configuration beyond the initial wrap command.

How do I monitor compression savings when using headroom wrap?

Compression events automatically increment ToolPattern.total_compressions counters stored in the tool pattern registry. Run headroom toin-publish --min-observations 1 to view per-transform statistics, or inspect the observability UI via headroom/cli/observability.py. The wrapper also emits telemetry notices through _print_telemetry_notice when environment variables like HEADROOM_STACK are detected.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →