How to Set Up Headroom MCP Tools: headroom_compress, headroom_retrieve, and headroom_stats

Headroom’s MCP server exposes three JSON-RPC tools that let LLMs compress large payloads, retrieve them by cryptographic hash, and monitor token savings through a lightweight HTTP endpoint running locally on port 8787.

The chopratejas/headroom repository provides a Model-Context-Processing (MCP) implementation that reduces context window pressure for LLMs. Setting up these Headroom MCP tools involves installing the package, registering the server with your assistant, and invoking the three endpoints—headroom_compress, headroom_retrieve, and headroom_stats—via standard HTTP POST requests.

Install the Headroom MCP Package

The MCP functionality ships as an optional extra. You can install only the tools or bundle them with the Headroom proxy sidecar.

Minimal Installation (MCP Only)

Install the core MCP server and CLI utilities:

pip install "headroom-ai[mcp]"

Full Stack Installation (with Proxy)

If you plan to run the proxy alongside the MCP server (enabling fallback retrieval from the proxy store), use:

pip install "headroom-ai[proxy]"

Register the Tools with Your LLM Client

Before the LLM can invoke the tools, you must register the server configuration. Headroom includes a convenience command that writes the necessary JSON configuration for Claude Code:

headroom mcp install

This command, implemented in headroom/cli.py, creates or updates ~/.claude/mcp.json, pointing the client to the local server endpoint.

Start the MCP Server

Launch the stand-alone server to expose the three endpoints. By default, the server listens on http://127.0.0.1:8787:


# Stand-alone MCP (lightweight)

headroom mcp serve

Alternatively, run the full stack with the proxy:


# Terminal 1: Start the proxy (also on 8787 by default)

headroom proxy &

# Terminal 2: Start the MCP server

headroom mcp serve

The server maintains a local compression store with an approximate TTL of one hour, managed internally by the storage layer referenced in headroom/mcp_server.py.

Using the Three Headroom MCP Tools

All tools communicate via HTTP POST with JSON payloads. The following examples assume the server is running at http://127.0.0.1:8787.

headroom_compress

Shrink any text payload—files, logs, JSON, or search results—and receive a hash for later retrieval:

import requests, json

BASE = "http://127.0.0.1:8787"

payload = {
    "tool": "headroom_compress",
    "parameters": {
        "content": "Very long text … (5,000 lines of grep results)"
    }
}
resp = requests.post(BASE, json=payload).json()

# resp contains:

# {

#   "compressed": "[key matches with context…]",

#   "hash": "a1b2c3d4e5f6…",

#   "original_tokens": 12000,

#   "compressed_tokens": 3200,

#   "savings_percent": 73.3,

#   "transforms": ["router:search:0.27"]

# }

Under the hood, headroom/transforms/content_router.py routes the content to the appropriate compressor—such as headroom/transforms/kompress_compressor.py for ML-based compression or headroom/transforms/search_compressor.py for structured search results. Token counts are calculated using utilities in headroom/tokenizers/, and markers are generated via headroom/utils.py (create_marker, create_tool_digest_marker).

headroom_retrieve

Fetch the original uncompressed content using the hash returned by the compress step:

retrieve_req = {
    "tool": "headroom_retrieve",
    "parameters": {
        "hash": resp["hash"],
        # optional: "query": "specific search term"  # returns only matches

    }
}
retrieved = requests.post(BASE, json=retrieve_req).json()

# → {"original_content": "... full uncompressed text …", "source": "local"}

If the proxy is also running and the local store misses, headroom_retrieve automatically falls back to the proxy store.

headroom_stats

Query session-wide metrics without parameters:

stats_req = {"tool": "headroom_stats", "parameters": {}}
stats = requests.post(BASE, json=stats_req).json()

The response includes:

  • compressions: Total number of compressions performed
  • retrievals: Number of successful retrievals
  • tokens_saved: Cumulative tokens reduced
  • savings_percent: Overall compression efficiency
  • estimated_cost_saved_usd: Approximate API cost avoided
  • recent_events: Timeline of compression activity
  • proxy: Metrics from the proxy (if connected)

Key Implementation Files

Understanding the source architecture helps debug custom deployments:

File Role
wiki/mcp.md Official user-facing documentation for the MCP architecture and CLI commands.
headroom/cli.py Implements headroom mcp install, serve, status, and uninstall commands.
headroom/mcp_server.py HTTP server exposing the three JSON-RPC endpoints and managing the local compression store.
headroom/utils.py Core utilities for marker generation (create_marker, create_tool_digest_marker) used by the compression pipeline.
headroom/transforms/content_router.py Decision logic routing content to the appropriate compressor (search, log, text, etc.).
headroom/transforms/kompress_compressor.py ML-based compressor for general text payloads.
headroom/transforms/search_compressor.py Specialized compressor for large grep or search results.
headroom/tokenizers/* Token-counting helpers (e.g., tiktoken_counter.py, mistral.py) calculating original_tokens and compressed_tokens.

Summary

  • Install the tools via pip install "headroom-ai[mcp]" or include [proxy] for the full stack.
  • Register the server with headroom mcp install to configure Claude Code or compatible assistants.
  • Launch the server using headroom mcp serve to expose endpoints at http://127.0.0.1:8787.
  • Compress payloads with headroom_compress to receive a hash, token counts, and routing metadata.
  • Retrieve data with headroom_retrieve using the stored hash, with optional query filtering.
  • Monitor usage and cost savings through headroom_stats, which aggregates metrics from both the local store and proxy.

Frequently Asked Questions

Do I need to run the Headroom proxy to use the MCP tools?

No. The three MCP tools operate independently of the proxy. However, if the proxy is running when you call headroom_retrieve, the tool will automatically fall back to the proxy store if the hash is not found in the local MCP cache.

How long does compressed data persist?

The local compression store maintains entries for approximately one hour (TTL ≈ 1 hour) before automatic expiration. This prevents unbounded memory growth while allowing sufficient time for multi-turn LLM conversations to retrieve context.

What determines which compression algorithm runs?

The headroom_compress endpoint uses headroom/transforms/content_router.py to analyze incoming content and select the optimal pipeline. For example, large grep results route to headroom/transforms/search_compressor.py, while general text routes to headroom/transforms/kompress_compressor.py. The transforms field in the response (e.g., ["router:search:0.27"]) indicates which path was taken.

Can I retrieve a subset of the compressed content?

Yes. When calling headroom_retrieve, include an optional query parameter in the JSON payload. The server will return only segments of the original content matching your query, reducing bandwidth without requiring you to fetch and filter the entire payload locally.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →