How to Wrap AI Coding Agents with Headroom for Automatic Compression
Headroom provides a single-command wrapper (headroom wrap <agent>) that launches a transparent MCP proxy to intercept HTTP/WebSocket traffic from AI coding agents and automatically applies AST-aware compression, token reduction, and tag protection before forwarding requests to upstream LLMs.
The chopratejas/headroom open-source project enables developers to wrap AI coding agents with Headroom for automatic compression, significantly reducing token costs and API latency without modifying agent source code. The wrapper supports Claude, Codex, Copilot, Aider, Cursor, and OpenClaw through a language-agnostic proxy architecture.
How the Wrap Proxy Intercepts Agent Traffic
When you execute headroom wrap <agent>, the CLI performs three core operations defined in headroom/cli/wrap.py (lines 2092–2610):
-
Proxy initialization: The wrapper starts a Headroom MCP proxy on the default port (9999) or reuses an existing persistent deployment. The implementation handles the
--skip-mcp-registerflag and invokesheadroom_retrieveon compression markers to manage proxy state. -
Compression pipeline activation: Every intercepted request feeds into Headroom’s transform pipeline. The
headroom/transforms/compression_policy.pymodule determines which transforms are active for the session, orchestratingheadroom/transforms/code_compressor.pyfor AST-based reduction andheadroom/transforms/tag_protector.pyfor placeholder safeguarding. -
Environment injection: The wrapper sets
HEADROOM_STACK=wrap_<agent>andHEADROOM_CONTEXT_TOOLso downstream components recognize the wrapped context. Telemetry helpers likedetect_stackand_print_telemetry_notice(referenced intest_telemetry_context.pylines 27–71) automatically emit compression metrics.
End-to-End Request Flow
The complete lifecycle of a wrapped request follows this execution path:
-
Command parsing:
headroom wrapidentifies the agent type (wrap_claude,wrap_codex, etc.) and separates wrapper flags from tool arguments using the double-dash (--) delimiter. -
Proxy selection: If a proxy already runs on the requested port, the wrapper reuses it; otherwise, it spawns a fresh instance via
headroom proxy. -
Request interception: The proxy handler in
headroom/cli/proxy.pyunwraps tool-specific envelopes (e.g., Codex’sresponse.createwrapper) and routes the payload to the compression stack. -
Transform execution: The payload traverses the ordered transforms from
compression_policy.py. Each transform may inject compression markers, rewrite AST nodes, or strip redundant tokens—yielding up to 30% savings on large code diffs when code-aware compression is enabled. -
Upstream forwarding: After compression, the modified request travels to the LLM endpoint. Responses optionally undergo reverse compression (e.g., boilerplate removal) before returning to the agent.
-
Metrics persistence: The system updates
ToolPattern.total_compressionscounters, exposing statistics viaheadroom toin-publishand the observability UI (headroom/cli/observability.py).
Wrapping Claude Code
To wrap Claude Code with automatic compression, run:
headroom wrap claude
This command sets HEADROOM_STACK=wrap_claude, initializes the MCP proxy on port 9999, and launches the Claude binary with --proxy http://127.0.0.1:9999. The agent operates normally while Headroom compresses all upstream traffic transparently.
Using Custom LLM Backends
You can redirect wrapped agents to alternative providers. For example, to wrap GitHub Copilot with a Groq backend:
headroom wrap copilot \
--backend anyllm \
--anyllm-provider groq \
-- --model gpt-4o
The --backend anyllm flag translates Copilot’s native calls to OpenAI-compatible /v1 endpoints. Arguments after -- pass directly to the agent (e.g., --model gpt-4o).
Enabling AST-Based Code Compression
For projects with substantial codebases, enable AST-aware compression to maximize token savings:
pip install "headroom-ai[code]"
headroom wrap claude --code-graph
The optional code_compressor transform (activated via --code-graph in compression_policy.py) parses Abstract Syntax Trees to eliminate redundant whitespace and comments while preserving semantic structure. The tag_protector.py module safeguards special placeholders during this process to prevent tool-specific markers from corruption.
Sharing Memory Across Agents
Headroom supports persistent, cross-agent memory through the MemoryWrapper (headroom/memory/wrapper.py):
headroom wrap claude --memory
headroom wrap codex --memory
Both commands connect to the same hierarchical memory store, allowing context reuse between Claude Code and Codex without data duplication.
Inspecting Compression Telemetry
After a session, view detailed compression statistics:
headroom toin-publish --min-observations 1
This displays a table of ToolPattern.total_compressions per transform stage, quantifying token savings achieved by the automatic compression pipeline.
Summary
- Single-command wrapping: Use
headroom wrap <agent>to instantiate a transparent MCP proxy that intercepts all agent traffic. - Automatic compression: The transform pipeline in
headroom/transforms/compression_policy.pyprocesses every request through AST-aware compression and tag protection. - Zero code changes: The wrapper is language-agnostic and requires no modifications to Claude, Codex, Copilot, Aider, Cursor, or OpenClaw.
- Optional enhancements: Install
headroom-ai[code]for code-graph compression or use--memoryfor cross-agent context sharing. - Built-in observability: Compression metrics automatically populate via
ToolPattern.total_compressionsand export throughheadroom toin-publish.
Frequently Asked Questions
What AI coding agents are supported by Headroom wrap?
Headroom supports any agent communicating over HTTP or WebSocket, including Claude Code, OpenAI Codex, GitHub Copilot, Aider, Cursor, and OpenClaw. The headroom/cli/wrap.py module implements specific handlers for each tool (e.g., wrap_claude, wrap_codex) while maintaining a language-agnostic proxy core that intercepts requests via headroom/cli/proxy.py.
How does Headroom compression reduce token usage without breaking functionality?
The system applies AST-aware transforms via headroom/transforms/code_compressor.py to remove syntactic noise (whitespace, comments) while preserving code semantics. The tag_protector.py module safeguards tool-specific placeholders, and the compression_policy.py orchestrator ensures transforms execute in the correct order. This architecture achieves up to 30% token reduction on diffs without altering functional behavior.
Can I use Headroom wrap with custom LLM backends?
Yes. The --backend anyllm flag redirects agent traffic to any OpenAI-compatible endpoint. You can specify providers like Groq, Together, or private deployments using --anyllm-provider <name>. The proxy in headroom/cli/proxy.py handles protocol translation automatically, requiring no changes to the agent’s configuration beyond the initial wrap command.
How do I monitor compression savings when using headroom wrap?
Compression events automatically increment ToolPattern.total_compressions counters stored in the tool pattern registry. Run headroom toin-publish --min-observations 1 to view per-transform statistics, or inspect the observability UI via headroom/cli/observability.py. The wrapper also emits telemetry notices through _print_telemetry_notice when environment variables like HEADROOM_STACK are detected.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →