# Headroom MCP Server Tools Explained: compress, retrieve, and stats

> Discover Headroom MCP server tools: compress LLM payloads, retrieve content by ID, and access operational metrics with these Rust-based utilities.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: deep-dive
- Published: 2026-06-09

---

**The Headroom MCP server tools**—`headroomcompress`, `headroomretrieve`, and `headroomstats`—are Rust-based command-line utilities that interface with the Headroom proxy server via HTTP endpoints to compress LLM message payloads, retrieve original content by ID, and expose operational metrics.

The `chopratejas/headroom` repository distributes these tools as part of the `headroom-proxy` crate, providing scriptable access to the compression pipeline without requiring language-specific SDKs. Defined as binary targets in [`crates/headroom-proxy/Cargo.toml`](https://github.com/chopratejas/headroom/blob/main/crates/headroom-proxy/Cargo.toml), these utilities interact with the server's Axum-based HTTP interface to perform token reduction and cache management operations.

## Overview of the Three MCP Server Tools

Each utility serves a distinct purpose in the Headroom ecosystem:

- **`headroomcompress`** – Sends JSON payloads to the `/v1/compress` endpoint to execute the full compression pipeline (SmartCrusher → ContentRouter → CacheAligner → optional Kompress).
- **`headroomretrieve`** – Queries stored **Compressed Content Records (CCRs)** via `/v1/retrieve/<compressed_id>` to recover original message arrays.
- **`headroomstats`** – Fetches diagnostic metrics from `/v1/stats`, including token savings, cache-hit ratios, and error rates.

All three default to `localhost:8000` and mirror the HTTP API used by the TypeScript and Python SDKs.

## How headroomcompress Functions

Located in [`crates/headroom-proxy/src/bin/headroomcompress.rs`](https://github.com/chopratejas/headroom/blob/main/crates/headroom-proxy/src/bin/headroomcompress.rs), this binary POSTs a JSON body containing `model` and `messages` fields to `/v1/compress`. The server processes the request through the Headroom pipeline and returns a JSON document with:

- `messages` – The compressed message array.
- `tokens_before` and `tokens_after` – Integer counts for comparison.
- `compression_ratio` – Floating-point value calculated as `tokens_after / tokens_before`.
- `compressed_id` – A stable string identifier for later retrieval.

## How headroomretrieve Functions

Implemented in [`crates/headroom-proxy/src/bin/headroomretrieve.rs`](https://github.com/chopratejas/headroom/blob/main/crates/headroom-proxy/src/bin/headroomretrieve.rs), this tool performs HTTP GET requests to `/v1/retrieve/<compressed_id>`. The server, defined in [`crates/headroom-proxy/src/server.rs`](https://github.com/chopratejas/headroom/blob/main/crates/headroom-proxy/src/server.rs), looks up the CCR and streams the original uncompressed content back to the client. This is essential for debugging what the compressor removed or re-hydrating cached content when the original source is unavailable.

## How headroomstats Functions

The `headroomstats` binary (source in [`crates/headroom-proxy/src/bin/headroomstats.rs`](https://github.com/chopratejas/headroom/blob/main/crates/headroom-proxy/src/bin/headroomstats.rs)) retrieves operational statistics via GET `/v1/stats`. The response includes aggregate metrics such as `total_requests`, `total_tokens_before`, `total_tokens_after`, `cache_hits`, and `error_rate`, enabling production monitoring and benchmarking of the compression service.

## Practical Usage Examples

### Compressing a Chat Payload

Create an input file and run the compressor:

```bash
cat > input.json <<'EOF'
{
  "model": "gpt-4o",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user",   "content": "Explain the difference between recursion and iteration."}
  ]
}
EOF

headroomcompress -i input.json -o compressed.json

```

The output [`compressed.json`](https://github.com/chopratejas/headroom/blob/main/compressed.json) contains:

```json
{
  "messages": [ {"role": "system", "content": "..."}, {"role": "user", "content": "..."} ],
  "tokens_before": 58,
  "tokens_after": 32,
  "compression_ratio": 0.55,
  "compressed_id": "c9f1a8d2"
}

```

### Retrieving Original Content

Use the ID returned by the compression step:

```bash
headroomretrieve -id c9f1a8d2 -o original.json

```

The file [`original.json`](https://github.com/chopratejas/headroom/blob/main/original.json) now contains the exact payload sent to the model before compression occurred.

### Checking Server Statistics

Execute the stats command to view aggregate metrics:

```bash
headroomstats

```

Typical output:

```json
{
  "total_requests": 215,
  "total_tokens_before": 12342,
  "total_tokens_after": 7891,
  "cache_hits": 48,
  "error_rate": 0.0
}

```

## Source Code Architecture

The MCP server tools are thin Rust wrappers around the HTTP API implemented in [`crates/headroom-proxy/src/server.rs`](https://github.com/chopratejas/headroom/blob/main/crates/headroom-proxy/src/server.rs). The [`Cargo.toml`](https://github.com/chopratejas/headroom/blob/main/Cargo.toml) defines three `[[bin]]` entries that compile to standalone executables, allowing direct interaction with the Axum-based server without SDK dependencies.

## Summary

- **headroomcompress** interfaces with `/v1/compress` to reduce token counts through the SmartCrusher → ContentRouter → CacheAligner pipeline.
- **headroomretrieve** recovers original content from CCRs using `/v1/retrieve/<compressed_id>`.
- **headroomstats** exposes operational health via `/v1/stats` for monitoring request volumes and efficiency gains.
- All tools are defined in [`crates/headroom-proxy/Cargo.toml`](https://github.com/chopratejas/headroom/blob/main/crates/headroom-proxy/Cargo.toml) and communicate over HTTP to `localhost:8000` by default.

## Frequently Asked Questions

### What is the difference between the MCP server tools and the Headroom SDKs?

The MCP server tools are lightweight, language-agnostic command-line binaries that directly invoke the HTTP endpoints defined in [`crates/headroom-proxy/src/server.rs`](https://github.com/chopratejas/headroom/blob/main/crates/headroom-proxy/src/server.rs). The TypeScript and Python SDKs provide higher-level abstractions and language-specific integrations, but ultimately call the same `/v1/compress`, `/v1/retrieve`, and `/v1/stats` endpoints.

### How does headroomcompress calculate compression ratios?

The server calculates the **compression_ratio** by dividing `tokens_after` by `tokens_before` after processing the input through the SmartCrusher and optional Kompress stages. This floating-point value is returned in the JSON response alongside the integer token counts.

### Can I use these tools with a remote Headroom server?

Yes. While the tools default to `localhost:8000`, they can be configured to target any reachable Headroom proxy instance by specifying the server URL and authentication credentials using the same scheme supported by the Python and TypeScript SDKs.

### What data structure does headroomretrieve return?

The tool returns the original message array or raw source file that was previously compressed, streaming it from the server's stored **Compressed Content Record (CCR)**. This structure contains the complete pre-compression payload associated with the `compressed_id`.