Headroom MCP Server Tools: A Complete Guide to headroomcompress, headroomretrieve, and headroomstats

Headroom provides three command-line MCP server tools—headroomcompress, headroomretrieve, and headroomstats—that interface with the local Message Compression Proxy to compress LLM payloads, retrieve original content by ID, and expose operational metrics via HTTP endpoints.

The headroom-proxy crate in the chopratejas/headroom repository ships these binaries to interact with the compression service without requiring language-specific SDKs. Defined in the [[bin]] sections of crates/headroom-proxy/Cargo.toml, these utilities act as thin HTTP clients that communicate with the MCP server running on localhost:8000 by default.

headroomcompress: Compressing LLM Message Payloads

The headroomcompress utility submits JSON payloads to the MCP server’s /v1/compress endpoint, triggering the full Headroom compression pipeline.

How the Compression Pipeline Works

When you invoke headroomcompress, the tool POSTs a JSON body containing the model name and messages array to /v1/compress. The server processes the request through the SmartCrusherContentRouterCacheAligner → optional Kompress pipeline as implemented in crates/headroom-proxy/src/server.rs.

The tool returns a JSON document containing:

  • messages – the compressed message list
  • tokens_before and tokens_after – token counts for comparison
  • compression_ratio – calculated as tokens_after / tokens_before
  • compressed_id – a stable identifier for later retrieval

Command-Line Usage


# Prepare an OpenAI-style conversation payload

cat > input.json <<'EOF'
{
  "model": "gpt-4o",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain the difference between recursion and iteration."}
  ]
}
EOF

# Execute compression (defaults to localhost:8000)

headroomcompress -i input.json -o compressed.json

Response Format

compressed.json contains the compression results:

{
  "messages": [{"role": "system", "content": "..."}],
  "tokens_before": 58,
  "tokens_after": 32,
  "compression_ratio": 0.55,
  "compressed_id": "c9f1a8d2"
}

headroomretrieve: Recovering Original Content

The headroomretrieve tool queries the MCP server for the original (uncompressed) content associated with a previously compressed message ID.

Retrieving Compressed Content Records (CCR)

This utility performs a GET request to /v1/retrieve/<compressed_id>. The server looks up the stored CCR (Compressed Content Record) in its internal database and streams the raw source data back. This is useful for debugging what the compressor removed or re-hydrating cached content when the original source is unavailable.


# Retrieve the original payload using the ID from headroomcompress

headroomretrieve -id c9f1a8d2 -o original.json

The original.json output contains the exact payload that was sent to the model before compression occurred.

headroomstats: Monitoring Compression Metrics

The headroomstats utility retrieves operational statistics from the MCP server to monitor health and efficiency.

Server Health and Efficiency Metrics

By sending a GET request to /v1/stats, the tool returns a JSON diagnostic object tracking:

  • total_requests – cumulative request count
  • total_tokens_before – tokens ingested before compression
  • total_tokens_after – tokens after compression
  • cache_hits – number of served cached responses
  • error_rate – current error percentage
headroomstats

Typical output:

{
  "total_requests": 215,
  "total_tokens_before": 12342,
  "total_tokens_after": 7891,
  "cache_hits": 48,
  "error_rate": 0.0
}

Implementation Details and Source Files

All three tools are implemented as standalone binaries in the crates/headroom-proxy/src/bin/ directory:

The shared HTTP server routing logic resides in crates/headroom-proxy/src/server.rs, which uses Axum to handle the three endpoints. The tools respect the same authentication scheme used by the Python and TypeScript SDKs, making them interoperable with existing Headroom deployments.

Summary

  • headroomcompress sends LLM message payloads to /v1/compress, executing the SmartCrusher → ContentRouter → CacheAligner pipeline and returning token savings plus a unique compressed_id
  • headroomretrieve recovers original content from CCR (Compressed Content Records) via GET /v1/retrieve/<compressed_id>, enabling content inspection and re-hydration
  • headroomstats exposes real-time operational metrics including cache hit ratios, token savings, and error rates via GET /v1/stats
  • All three binaries are built from the headroom-proxy crate and communicate over HTTP, providing scriptable access to the MCP server without SDK dependencies

Frequently Asked Questions

What is the difference between headroomcompress and the higher-level SDKs?

The headroomcompress binary is a thin HTTP wrapper around the same /v1/compress endpoint used by the Python and TypeScript SDKs. While the SDKs provide language-native interfaces and automatic retry logic, headroomcompress offers a lightweight, dependency-free alternative for shell scripts and debugging scenarios where installing a full SDK is unnecessary.

How does headroomretrieve access the original message content?

The tool queries the MCP server's internal Compressed Content Record (CCR) storage using the compressed_id returned during compression. According to crates/headroom-proxy/src/server.rs, the server maintains a mapping between compressed IDs and original payloads, allowing headroomretrieve to stream the raw data back without re-processing the compression pipeline.

What authentication do the MCP server tools require?

The command-line utilities respect the same authentication middleware configured for the MCP server. When authentication is enabled, requests to /v1/compress, /v1/retrieve, and /v1/stats must include valid credentials in the HTTP headers, consistent with the security model used by the Python and TypeScript SDKs.

Where are these tools installed from?

The binaries are compiled from the headroom-proxy crate defined in crates/headroom-proxy/Cargo.toml under [[bin]] sections. When you build the proxy with cargo build --release, Cargo installs headroomcompress, headroomretrieve, and headroomstats as executable binaries in your system path, making them available for local or containerized deployments.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →