# Headroom MCP Server Tools: A Complete Guide to headroomcompress, headroomretrieve, and headroomstats

> Master Headroom MCP server tools: headroomcompress, headroomretrieve, and headroomstats. Compress LLM payloads, retrieve content, and view metrics with this comprehensive guide.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: deep-dive
- Published: 2026-06-07

---

**Headroom provides three command-line MCP server tools—`headroomcompress`, `headroomretrieve`, and `headroomstats`—that interface with the local Message Compression Proxy to compress LLM payloads, retrieve original content by ID, and expose operational metrics via HTTP endpoints.**

The `headroom-proxy` crate in the [chopratejas/headroom](https://github.com/chopratejas/headroom) repository ships these binaries to interact with the compression service without requiring language-specific SDKs. Defined in the `[[bin]]` sections of [`crates/headroom-proxy/Cargo.toml`](https://github.com/chopratejas/headroom/blob/main/crates/headroom-proxy/Cargo.toml), these utilities act as thin HTTP clients that communicate with the MCP server running on `localhost:8000` by default.

## headroomcompress: Compressing LLM Message Payloads

The `headroomcompress` utility submits JSON payloads to the MCP server’s `/v1/compress` endpoint, triggering the full Headroom compression pipeline.

### How the Compression Pipeline Works

When you invoke `headroomcompress`, the tool POSTs a JSON body containing the model name and messages array to `/v1/compress`. The server processes the request through the **SmartCrusher** → **ContentRouter** → **CacheAligner** → optional **Kompress** pipeline as implemented in [`crates/headroom-proxy/src/server.rs`](https://github.com/chopratejas/headroom/blob/main/crates/headroom-proxy/src/server.rs).

The tool returns a JSON document containing:
- `messages` – the compressed message list
- `tokens_before` and `tokens_after` – token counts for comparison
- `compression_ratio` – calculated as `tokens_after / tokens_before`
- `compressed_id` – a stable identifier for later retrieval

### Command-Line Usage

```bash

# Prepare an OpenAI-style conversation payload

cat > input.json <<'EOF'
{
  "model": "gpt-4o",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain the difference between recursion and iteration."}
  ]
}
EOF

# Execute compression (defaults to localhost:8000)

headroomcompress -i input.json -o compressed.json

```

### Response Format

[`compressed.json`](https://github.com/chopratejas/headroom/blob/main/compressed.json) contains the compression results:

```json
{
  "messages": [{"role": "system", "content": "..."}],
  "tokens_before": 58,
  "tokens_after": 32,
  "compression_ratio": 0.55,
  "compressed_id": "c9f1a8d2"
}

```

## headroomretrieve: Recovering Original Content

The `headroomretrieve` tool queries the MCP server for the original (uncompressed) content associated with a previously compressed message ID.

### Retrieving Compressed Content Records (CCR)

This utility performs a GET request to `/v1/retrieve/<compressed_id>`. The server looks up the stored **CCR (Compressed Content Record)** in its internal database and streams the raw source data back. This is useful for debugging what the compressor removed or re-hydrating cached content when the original source is unavailable.

```bash

# Retrieve the original payload using the ID from headroomcompress

headroomretrieve -id c9f1a8d2 -o original.json

```

The [`original.json`](https://github.com/chopratejas/headroom/blob/main/original.json) output contains the exact payload that was sent to the model before compression occurred.

## headroomstats: Monitoring Compression Metrics

The `headroomstats` utility retrieves operational statistics from the MCP server to monitor health and efficiency.

### Server Health and Efficiency Metrics

By sending a GET request to `/v1/stats`, the tool returns a JSON diagnostic object tracking:

- `total_requests` – cumulative request count
- `total_tokens_before` – tokens ingested before compression
- `total_tokens_after` – tokens after compression
- `cache_hits` – number of served cached responses
- `error_rate` – current error percentage

```bash
headroomstats

```

Typical output:

```json
{
  "total_requests": 215,
  "total_tokens_before": 12342,
  "total_tokens_after": 7891,
  "cache_hits": 48,
  "error_rate": 0.0
}

```

## Implementation Details and Source Files

All three tools are implemented as standalone binaries in the `crates/headroom-proxy/src/bin/` directory:

- **[`headroomcompress.rs`](https://github.com/chopratejas/headroom/blob/main/headroomcompress.rs)** – Implements CLI logic for POST requests to `/v1/compress`
- **[`headroomretrieve.rs`](https://github.com/chopratejas/headroom/blob/main/headroomretrieve.rs)** – Implements CLI logic for GET requests to `/v1/retrieve`
- **[`headroomstats.rs`](https://github.com/chopratejas/headroom/blob/main/headroomstats.rs)** – Implements CLI logic for GET requests to `/v1/stats`

The shared HTTP server routing logic resides in **[`crates/headroom-proxy/src/server.rs`](https://github.com/chopratejas/headroom/blob/main/crates/headroom-proxy/src/server.rs)**, which uses Axum to handle the three endpoints. The tools respect the same authentication scheme used by the Python and TypeScript SDKs, making them interoperable with existing Headroom deployments.

## Summary

- **`headroomcompress`** sends LLM message payloads to `/v1/compress`, executing the SmartCrusher → ContentRouter → CacheAligner pipeline and returning token savings plus a unique `compressed_id`
- **`headroomretrieve`** recovers original content from **CCR (Compressed Content Records)** via GET `/v1/retrieve/<compressed_id>`, enabling content inspection and re-hydration
- **`headroomstats`** exposes real-time operational metrics including cache hit ratios, token savings, and error rates via GET `/v1/stats`
- All three binaries are built from the `headroom-proxy` crate and communicate over HTTP, providing scriptable access to the MCP server without SDK dependencies

## Frequently Asked Questions

### What is the difference between headroomcompress and the higher-level SDKs?

The `headroomcompress` binary is a thin HTTP wrapper around the same `/v1/compress` endpoint used by the Python and TypeScript SDKs. While the SDKs provide language-native interfaces and automatic retry logic, `headroomcompress` offers a lightweight, dependency-free alternative for shell scripts and debugging scenarios where installing a full SDK is unnecessary.

### How does headroomretrieve access the original message content?

The tool queries the MCP server's internal **Compressed Content Record (CCR)** storage using the `compressed_id` returned during compression. According to [`crates/headroom-proxy/src/server.rs`](https://github.com/chopratejas/headroom/blob/main/crates/headroom-proxy/src/server.rs), the server maintains a mapping between compressed IDs and original payloads, allowing `headroomretrieve` to stream the raw data back without re-processing the compression pipeline.

### What authentication do the MCP server tools require?

The command-line utilities respect the same authentication middleware configured for the MCP server. When authentication is enabled, requests to `/v1/compress`, `/v1/retrieve`, and `/v1/stats` must include valid credentials in the HTTP headers, consistent with the security model used by the Python and TypeScript SDKs.

### Where are these tools installed from?

The binaries are compiled from the `headroom-proxy` crate defined in [`crates/headroom-proxy/Cargo.toml`](https://github.com/chopratejas/headroom/blob/main/crates/headroom-proxy/Cargo.toml) under `[[bin]]` sections. When you build the proxy with `cargo build --release`, Cargo installs `headroomcompress`, `headroomretrieve`, and `headroomstats` as executable binaries in your system path, making them available for local or containerized deployments.