How to Set Up Headroom MCP Tools: headroom_compress, headroom_retrieve, and headroom_stats
Headroom’s MCP server exposes three JSON-RPC tools that let LLMs compress large payloads, retrieve them by cryptographic hash, and monitor token savings through a lightweight HTTP endpoint running locally on port 8787.
The chopratejas/headroom repository provides a Model-Context-Processing (MCP) implementation that reduces context window pressure for LLMs. Setting up these Headroom MCP tools involves installing the package, registering the server with your assistant, and invoking the three endpoints—headroom_compress, headroom_retrieve, and headroom_stats—via standard HTTP POST requests.
Install the Headroom MCP Package
The MCP functionality ships as an optional extra. You can install only the tools or bundle them with the Headroom proxy sidecar.
Minimal Installation (MCP Only)
Install the core MCP server and CLI utilities:
pip install "headroom-ai[mcp]"
Full Stack Installation (with Proxy)
If you plan to run the proxy alongside the MCP server (enabling fallback retrieval from the proxy store), use:
pip install "headroom-ai[proxy]"
Register the Tools with Your LLM Client
Before the LLM can invoke the tools, you must register the server configuration. Headroom includes a convenience command that writes the necessary JSON configuration for Claude Code:
headroom mcp install
This command, implemented in headroom/cli.py, creates or updates ~/.claude/mcp.json, pointing the client to the local server endpoint.
Start the MCP Server
Launch the stand-alone server to expose the three endpoints. By default, the server listens on http://127.0.0.1:8787:
# Stand-alone MCP (lightweight)
headroom mcp serve
Alternatively, run the full stack with the proxy:
# Terminal 1: Start the proxy (also on 8787 by default)
headroom proxy &
# Terminal 2: Start the MCP server
headroom mcp serve
The server maintains a local compression store with an approximate TTL of one hour, managed internally by the storage layer referenced in headroom/mcp_server.py.
Using the Three Headroom MCP Tools
All tools communicate via HTTP POST with JSON payloads. The following examples assume the server is running at http://127.0.0.1:8787.
headroom_compress
Shrink any text payload—files, logs, JSON, or search results—and receive a hash for later retrieval:
import requests, json
BASE = "http://127.0.0.1:8787"
payload = {
"tool": "headroom_compress",
"parameters": {
"content": "Very long text … (5,000 lines of grep results)"
}
}
resp = requests.post(BASE, json=payload).json()
# resp contains:
# {
# "compressed": "[key matches with context…]",
# "hash": "a1b2c3d4e5f6…",
# "original_tokens": 12000,
# "compressed_tokens": 3200,
# "savings_percent": 73.3,
# "transforms": ["router:search:0.27"]
# }
Under the hood, headroom/transforms/content_router.py routes the content to the appropriate compressor—such as headroom/transforms/kompress_compressor.py for ML-based compression or headroom/transforms/search_compressor.py for structured search results. Token counts are calculated using utilities in headroom/tokenizers/, and markers are generated via headroom/utils.py (create_marker, create_tool_digest_marker).
headroom_retrieve
Fetch the original uncompressed content using the hash returned by the compress step:
retrieve_req = {
"tool": "headroom_retrieve",
"parameters": {
"hash": resp["hash"],
# optional: "query": "specific search term" # returns only matches
}
}
retrieved = requests.post(BASE, json=retrieve_req).json()
# → {"original_content": "... full uncompressed text …", "source": "local"}
If the proxy is also running and the local store misses, headroom_retrieve automatically falls back to the proxy store.
headroom_stats
Query session-wide metrics without parameters:
stats_req = {"tool": "headroom_stats", "parameters": {}}
stats = requests.post(BASE, json=stats_req).json()
The response includes:
- compressions: Total number of compressions performed
- retrievals: Number of successful retrievals
- tokens_saved: Cumulative tokens reduced
- savings_percent: Overall compression efficiency
- estimated_cost_saved_usd: Approximate API cost avoided
- recent_events: Timeline of compression activity
- proxy: Metrics from the proxy (if connected)
Key Implementation Files
Understanding the source architecture helps debug custom deployments:
| File | Role |
|---|---|
wiki/mcp.md |
Official user-facing documentation for the MCP architecture and CLI commands. |
headroom/cli.py |
Implements headroom mcp install, serve, status, and uninstall commands. |
headroom/mcp_server.py |
HTTP server exposing the three JSON-RPC endpoints and managing the local compression store. |
headroom/utils.py |
Core utilities for marker generation (create_marker, create_tool_digest_marker) used by the compression pipeline. |
headroom/transforms/content_router.py |
Decision logic routing content to the appropriate compressor (search, log, text, etc.). |
headroom/transforms/kompress_compressor.py |
ML-based compressor for general text payloads. |
headroom/transforms/search_compressor.py |
Specialized compressor for large grep or search results. |
headroom/tokenizers/* |
Token-counting helpers (e.g., tiktoken_counter.py, mistral.py) calculating original_tokens and compressed_tokens. |
Summary
- Install the tools via
pip install "headroom-ai[mcp]"or include[proxy]for the full stack. - Register the server with
headroom mcp installto configure Claude Code or compatible assistants. - Launch the server using
headroom mcp serveto expose endpoints athttp://127.0.0.1:8787. - Compress payloads with
headroom_compressto receive a hash, token counts, and routing metadata. - Retrieve data with
headroom_retrieveusing the stored hash, with optional query filtering. - Monitor usage and cost savings through
headroom_stats, which aggregates metrics from both the local store and proxy.
Frequently Asked Questions
Do I need to run the Headroom proxy to use the MCP tools?
No. The three MCP tools operate independently of the proxy. However, if the proxy is running when you call headroom_retrieve, the tool will automatically fall back to the proxy store if the hash is not found in the local MCP cache.
How long does compressed data persist?
The local compression store maintains entries for approximately one hour (TTL ≈ 1 hour) before automatic expiration. This prevents unbounded memory growth while allowing sufficient time for multi-turn LLM conversations to retrieve context.
What determines which compression algorithm runs?
The headroom_compress endpoint uses headroom/transforms/content_router.py to analyze incoming content and select the optimal pipeline. For example, large grep results route to headroom/transforms/search_compressor.py, while general text routes to headroom/transforms/kompress_compressor.py. The transforms field in the response (e.g., ["router:search:0.27"]) indicates which path was taken.
Can I retrieve a subset of the compressed content?
Yes. When calling headroom_retrieve, include an optional query parameter in the JSON payload. The server will return only segments of the original content matching your query, reducing bandwidth without requiring you to fetch and filter the entire payload locally.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →