Headroom MCP Server Tools Explained: compress, retrieve, and stats
The Headroom MCP server tools—headroomcompress, headroomretrieve, and headroomstats—are Rust-based command-line utilities that interface with the Headroom proxy server via HTTP endpoints to compress LLM message payloads, retrieve original content by ID, and expose operational metrics.
The chopratejas/headroom repository distributes these tools as part of the headroom-proxy crate, providing scriptable access to the compression pipeline without requiring language-specific SDKs. Defined as binary targets in crates/headroom-proxy/Cargo.toml, these utilities interact with the server's Axum-based HTTP interface to perform token reduction and cache management operations.
Overview of the Three MCP Server Tools
Each utility serves a distinct purpose in the Headroom ecosystem:
headroomcompress– Sends JSON payloads to the/v1/compressendpoint to execute the full compression pipeline (SmartCrusher → ContentRouter → CacheAligner → optional Kompress).headroomretrieve– Queries stored Compressed Content Records (CCRs) via/v1/retrieve/<compressed_id>to recover original message arrays.headroomstats– Fetches diagnostic metrics from/v1/stats, including token savings, cache-hit ratios, and error rates.
All three default to localhost:8000 and mirror the HTTP API used by the TypeScript and Python SDKs.
How headroomcompress Functions
Located in crates/headroom-proxy/src/bin/headroomcompress.rs, this binary POSTs a JSON body containing model and messages fields to /v1/compress. The server processes the request through the Headroom pipeline and returns a JSON document with:
messages– The compressed message array.tokens_beforeandtokens_after– Integer counts for comparison.compression_ratio– Floating-point value calculated astokens_after / tokens_before.compressed_id– A stable string identifier for later retrieval.
How headroomretrieve Functions
Implemented in crates/headroom-proxy/src/bin/headroomretrieve.rs, this tool performs HTTP GET requests to /v1/retrieve/<compressed_id>. The server, defined in crates/headroom-proxy/src/server.rs, looks up the CCR and streams the original uncompressed content back to the client. This is essential for debugging what the compressor removed or re-hydrating cached content when the original source is unavailable.
How headroomstats Functions
The headroomstats binary (source in crates/headroom-proxy/src/bin/headroomstats.rs) retrieves operational statistics via GET /v1/stats. The response includes aggregate metrics such as total_requests, total_tokens_before, total_tokens_after, cache_hits, and error_rate, enabling production monitoring and benchmarking of the compression service.
Practical Usage Examples
Compressing a Chat Payload
Create an input file and run the compressor:
cat > input.json <<'EOF'
{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain the difference between recursion and iteration."}
]
}
EOF
headroomcompress -i input.json -o compressed.json
The output compressed.json contains:
{
"messages": [ {"role": "system", "content": "..."}, {"role": "user", "content": "..."} ],
"tokens_before": 58,
"tokens_after": 32,
"compression_ratio": 0.55,
"compressed_id": "c9f1a8d2"
}
Retrieving Original Content
Use the ID returned by the compression step:
headroomretrieve -id c9f1a8d2 -o original.json
The file original.json now contains the exact payload sent to the model before compression occurred.
Checking Server Statistics
Execute the stats command to view aggregate metrics:
headroomstats
Typical output:
{
"total_requests": 215,
"total_tokens_before": 12342,
"total_tokens_after": 7891,
"cache_hits": 48,
"error_rate": 0.0
}
Source Code Architecture
The MCP server tools are thin Rust wrappers around the HTTP API implemented in crates/headroom-proxy/src/server.rs. The Cargo.toml defines three [[bin]] entries that compile to standalone executables, allowing direct interaction with the Axum-based server without SDK dependencies.
Summary
- headroomcompress interfaces with
/v1/compressto reduce token counts through the SmartCrusher → ContentRouter → CacheAligner pipeline. - headroomretrieve recovers original content from CCRs using
/v1/retrieve/<compressed_id>. - headroomstats exposes operational health via
/v1/statsfor monitoring request volumes and efficiency gains. - All tools are defined in
crates/headroom-proxy/Cargo.tomland communicate over HTTP tolocalhost:8000by default.
Frequently Asked Questions
What is the difference between the MCP server tools and the Headroom SDKs?
The MCP server tools are lightweight, language-agnostic command-line binaries that directly invoke the HTTP endpoints defined in crates/headroom-proxy/src/server.rs. The TypeScript and Python SDKs provide higher-level abstractions and language-specific integrations, but ultimately call the same /v1/compress, /v1/retrieve, and /v1/stats endpoints.
How does headroomcompress calculate compression ratios?
The server calculates the compression_ratio by dividing tokens_after by tokens_before after processing the input through the SmartCrusher and optional Kompress stages. This floating-point value is returned in the JSON response alongside the integer token counts.
Can I use these tools with a remote Headroom server?
Yes. While the tools default to localhost:8000, they can be configured to target any reachable Headroom proxy instance by specifying the server URL and authentication credentials using the same scheme supported by the Python and TypeScript SDKs.
What data structure does headroomretrieve return?
The tool returns the original message array or raw source file that was previously compressed, streaming it from the server's stored Compressed Content Record (CCR). This structure contains the complete pre-compression payload associated with the compressed_id.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →