deep-dive

Headroom MCP Server Tools Explained: compress, retrieve, and stats

June 9, 2026 chopratejas/headroom ↗

The Headroom MCP server tools—headroomcompress, headroomretrieve, and headroomstats—are Rust-based command-line utilities that interface with the Headroom proxy server via HTTP endpoints to compress LLM message payloads, retrieve original content by ID, and expose operational metrics.

The chopratejas/headroom repository distributes these tools as part of the headroom-proxy crate, providing scriptable access to the compression pipeline without requiring language-specific SDKs. Defined as binary targets in crates/headroom-proxy/Cargo.toml, these utilities interact with the server's Axum-based HTTP interface to perform token reduction and cache management operations.

Overview of the Three MCP Server Tools

Each utility serves a distinct purpose in the Headroom ecosystem:

headroomcompress – Sends JSON payloads to the /v1/compress endpoint to execute the full compression pipeline (SmartCrusher → ContentRouter → CacheAligner → optional Kompress).
headroomretrieve – Queries stored Compressed Content Records (CCRs) via /v1/retrieve/<compressed_id> to recover original message arrays.
headroomstats – Fetches diagnostic metrics from /v1/stats, including token savings, cache-hit ratios, and error rates.

All three default to localhost:8000 and mirror the HTTP API used by the TypeScript and Python SDKs.

How headroomcompress Functions

Located in crates/headroom-proxy/src/bin/headroomcompress.rs, this binary POSTs a JSON body containing model and messages fields to /v1/compress. The server processes the request through the Headroom pipeline and returns a JSON document with:

messages – The compressed message array.
tokens_before and tokens_after – Integer counts for comparison.
compression_ratio – Floating-point value calculated as tokens_after / tokens_before.
compressed_id – A stable string identifier for later retrieval.

How headroomretrieve Functions

Implemented in crates/headroom-proxy/src/bin/headroomretrieve.rs, this tool performs HTTP GET requests to /v1/retrieve/<compressed_id>. The server, defined in crates/headroom-proxy/src/server.rs, looks up the CCR and streams the original uncompressed content back to the client. This is essential for debugging what the compressor removed or re-hydrating cached content when the original source is unavailable.

How headroomstats Functions

The headroomstats binary (source in crates/headroom-proxy/src/bin/headroomstats.rs) retrieves operational statistics via GET /v1/stats. The response includes aggregate metrics such as total_requests, total_tokens_before, total_tokens_after, cache_hits, and error_rate, enabling production monitoring and benchmarking of the compression service.

Practical Usage Examples

Compressing a Chat Payload

Create an input file and run the compressor:

cat > input.json <<'EOF'
{
  "model": "gpt-4o",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user",   "content": "Explain the difference between recursion and iteration."}
  ]
}
EOF

headroomcompress -i input.json -o compressed.json

The output compressed.json contains:

{
  "messages": [ {"role": "system", "content": "..."}, {"role": "user", "content": "..."} ],
  "tokens_before": 58,
  "tokens_after": 32,
  "compression_ratio": 0.55,
  "compressed_id": "c9f1a8d2"
}

Retrieving Original Content

Use the ID returned by the compression step:

headroomretrieve -id c9f1a8d2 -o original.json

The file original.json now contains the exact payload sent to the model before compression occurred.

Checking Server Statistics

Execute the stats command to view aggregate metrics:

headroomstats

Typical output:

{
  "total_requests": 215,
  "total_tokens_before": 12342,
  "total_tokens_after": 7891,
  "cache_hits": 48,
  "error_rate": 0.0
}

Source Code Architecture

The MCP server tools are thin Rust wrappers around the HTTP API implemented in crates/headroom-proxy/src/server.rs. The Cargo.toml defines three [[bin]] entries that compile to standalone executables, allowing direct interaction with the Axum-based server without SDK dependencies.

Summary

headroomcompress interfaces with /v1/compress to reduce token counts through the SmartCrusher → ContentRouter → CacheAligner pipeline.
headroomretrieve recovers original content from CCRs using /v1/retrieve/<compressed_id>.
headroomstats exposes operational health via /v1/stats for monitoring request volumes and efficiency gains.
All tools are defined in crates/headroom-proxy/Cargo.toml and communicate over HTTP to localhost:8000 by default.

Frequently Asked Questions

What is the difference between the MCP server tools and the Headroom SDKs?

The MCP server tools are lightweight, language-agnostic command-line binaries that directly invoke the HTTP endpoints defined in crates/headroom-proxy/src/server.rs. The TypeScript and Python SDKs provide higher-level abstractions and language-specific integrations, but ultimately call the same /v1/compress, /v1/retrieve, and /v1/stats endpoints.

How does headroomcompress calculate compression ratios?

The server calculates the compression_ratio by dividing tokens_after by tokens_before after processing the input through the SmartCrusher and optional Kompress stages. This floating-point value is returned in the JSON response alongside the integer token counts.

Can I use these tools with a remote Headroom server?

Yes. While the tools default to localhost:8000, they can be configured to target any reachable Headroom proxy instance by specifying the server URL and authentication credentials using the same scheme supported by the Python and TypeScript SDKs.

What data structure does headroomretrieve return?

The tool returns the original message array or raw source file that was previously compressed, streaming it from the server's stored Compressed Content Record (CCR). This structure contains the complete pre-compression payload associated with the compressed_id.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how chopratejas/headroom works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →