Headroom MCP Server Tools: A Complete Guide to headroomcompress, headroomretrieve, and headroomstats
Headroom provides three command-line MCP server tools—headroomcompress, headroomretrieve, and headroomstats—that interface with the local Message Compression Proxy to compress LLM payloads, retrieve original content by ID, and expose operational metrics via HTTP endpoints.
The headroom-proxy crate in the chopratejas/headroom repository ships these binaries to interact with the compression service without requiring language-specific SDKs. Defined in the [[bin]] sections of crates/headroom-proxy/Cargo.toml, these utilities act as thin HTTP clients that communicate with the MCP server running on localhost:8000 by default.
headroomcompress: Compressing LLM Message Payloads
The headroomcompress utility submits JSON payloads to the MCP server’s /v1/compress endpoint, triggering the full Headroom compression pipeline.
How the Compression Pipeline Works
When you invoke headroomcompress, the tool POSTs a JSON body containing the model name and messages array to /v1/compress. The server processes the request through the SmartCrusher → ContentRouter → CacheAligner → optional Kompress pipeline as implemented in crates/headroom-proxy/src/server.rs.
The tool returns a JSON document containing:
messages– the compressed message listtokens_beforeandtokens_after– token counts for comparisoncompression_ratio– calculated astokens_after / tokens_beforecompressed_id– a stable identifier for later retrieval
Command-Line Usage
# Prepare an OpenAI-style conversation payload
cat > input.json <<'EOF'
{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain the difference between recursion and iteration."}
]
}
EOF
# Execute compression (defaults to localhost:8000)
headroomcompress -i input.json -o compressed.json
Response Format
compressed.json contains the compression results:
{
"messages": [{"role": "system", "content": "..."}],
"tokens_before": 58,
"tokens_after": 32,
"compression_ratio": 0.55,
"compressed_id": "c9f1a8d2"
}
headroomretrieve: Recovering Original Content
The headroomretrieve tool queries the MCP server for the original (uncompressed) content associated with a previously compressed message ID.
Retrieving Compressed Content Records (CCR)
This utility performs a GET request to /v1/retrieve/<compressed_id>. The server looks up the stored CCR (Compressed Content Record) in its internal database and streams the raw source data back. This is useful for debugging what the compressor removed or re-hydrating cached content when the original source is unavailable.
# Retrieve the original payload using the ID from headroomcompress
headroomretrieve -id c9f1a8d2 -o original.json
The original.json output contains the exact payload that was sent to the model before compression occurred.
headroomstats: Monitoring Compression Metrics
The headroomstats utility retrieves operational statistics from the MCP server to monitor health and efficiency.
Server Health and Efficiency Metrics
By sending a GET request to /v1/stats, the tool returns a JSON diagnostic object tracking:
total_requests– cumulative request counttotal_tokens_before– tokens ingested before compressiontotal_tokens_after– tokens after compressioncache_hits– number of served cached responseserror_rate– current error percentage
headroomstats
Typical output:
{
"total_requests": 215,
"total_tokens_before": 12342,
"total_tokens_after": 7891,
"cache_hits": 48,
"error_rate": 0.0
}
Implementation Details and Source Files
All three tools are implemented as standalone binaries in the crates/headroom-proxy/src/bin/ directory:
headroomcompress.rs– Implements CLI logic for POST requests to/v1/compressheadroomretrieve.rs– Implements CLI logic for GET requests to/v1/retrieveheadroomstats.rs– Implements CLI logic for GET requests to/v1/stats
The shared HTTP server routing logic resides in crates/headroom-proxy/src/server.rs, which uses Axum to handle the three endpoints. The tools respect the same authentication scheme used by the Python and TypeScript SDKs, making them interoperable with existing Headroom deployments.
Summary
headroomcompresssends LLM message payloads to/v1/compress, executing the SmartCrusher → ContentRouter → CacheAligner pipeline and returning token savings plus a uniquecompressed_idheadroomretrieverecovers original content from CCR (Compressed Content Records) via GET/v1/retrieve/<compressed_id>, enabling content inspection and re-hydrationheadroomstatsexposes real-time operational metrics including cache hit ratios, token savings, and error rates via GET/v1/stats- All three binaries are built from the
headroom-proxycrate and communicate over HTTP, providing scriptable access to the MCP server without SDK dependencies
Frequently Asked Questions
What is the difference between headroomcompress and the higher-level SDKs?
The headroomcompress binary is a thin HTTP wrapper around the same /v1/compress endpoint used by the Python and TypeScript SDKs. While the SDKs provide language-native interfaces and automatic retry logic, headroomcompress offers a lightweight, dependency-free alternative for shell scripts and debugging scenarios where installing a full SDK is unnecessary.
How does headroomretrieve access the original message content?
The tool queries the MCP server's internal Compressed Content Record (CCR) storage using the compressed_id returned during compression. According to crates/headroom-proxy/src/server.rs, the server maintains a mapping between compressed IDs and original payloads, allowing headroomretrieve to stream the raw data back without re-processing the compression pipeline.
What authentication do the MCP server tools require?
The command-line utilities respect the same authentication middleware configured for the MCP server. When authentication is enabled, requests to /v1/compress, /v1/retrieve, and /v1/stats must include valid credentials in the HTTP headers, consistent with the security model used by the Python and TypeScript SDKs.
Where are these tools installed from?
The binaries are compiled from the headroom-proxy crate defined in crates/headroom-proxy/Cargo.toml under [[bin]] sections. When you build the proxy with cargo build --release, Cargo installs headroomcompress, headroomretrieve, and headroomstats as executable binaries in your system path, making them available for local or containerized deployments.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →