# How to Set Up Headroom MCP Tools: headroom_compress, headroom_retrieve, and headroom_stats

> Learn to set up Headroom MCP tools: headroom_compress, headroom_retrieve, and headroom_stats. Compress large LLM payloads, retrieve them by hash, and monitor token savings locally.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: how-to-guide
- Published: 2026-06-06

---

**Headroom’s MCP server exposes three JSON-RPC tools that let LLMs compress large payloads, retrieve them by cryptographic hash, and monitor token savings through a lightweight HTTP endpoint running locally on port 8787.**

The `chopratejas/headroom` repository provides a Model-Context-Processing (MCP) implementation that reduces context window pressure for LLMs. Setting up these Headroom MCP tools involves installing the package, registering the server with your assistant, and invoking the three endpoints—**headroom_compress**, **headroom_retrieve**, and **headroom_stats**—via standard HTTP POST requests.

## Install the Headroom MCP Package

The MCP functionality ships as an optional extra. You can install only the tools or bundle them with the Headroom proxy sidecar.

### Minimal Installation (MCP Only)

Install the core MCP server and CLI utilities:

```bash
pip install "headroom-ai[mcp]"

```

### Full Stack Installation (with Proxy)

If you plan to run the proxy alongside the MCP server (enabling fallback retrieval from the proxy store), use:

```bash
pip install "headroom-ai[proxy]"

```

## Register the Tools with Your LLM Client

Before the LLM can invoke the tools, you must register the server configuration. Headroom includes a convenience command that writes the necessary JSON configuration for Claude Code:

```bash
headroom mcp install

```

This command, implemented in [`headroom/cli.py`](https://github.com/chopratejas/headroom/blob/main/headroom/cli.py), creates or updates `~/.claude/mcp.json`, pointing the client to the local server endpoint.

## Start the MCP Server

Launch the stand-alone server to expose the three endpoints. By default, the server listens on `http://127.0.0.1:8787`:

```bash

# Stand-alone MCP (lightweight)

headroom mcp serve

```

Alternatively, run the full stack with the proxy:

```bash

# Terminal 1: Start the proxy (also on 8787 by default)

headroom proxy &

# Terminal 2: Start the MCP server

headroom mcp serve

```

The server maintains a local **compression store** with an approximate TTL of one hour, managed internally by the storage layer referenced in [`headroom/mcp_server.py`](https://github.com/chopratejas/headroom/blob/main/headroom/mcp_server.py).

## Using the Three Headroom MCP Tools

All tools communicate via HTTP POST with JSON payloads. The following examples assume the server is running at `http://127.0.0.1:8787`.

### headroom_compress

Shrink any text payload—files, logs, JSON, or search results—and receive a hash for later retrieval:

```python
import requests, json

BASE = "http://127.0.0.1:8787"

payload = {
    "tool": "headroom_compress",
    "parameters": {
        "content": "Very long text … (5,000 lines of grep results)"
    }
}
resp = requests.post(BASE, json=payload).json()

# resp contains:

# {

#   "compressed": "[key matches with context…]",

#   "hash": "a1b2c3d4e5f6…",

#   "original_tokens": 12000,

#   "compressed_tokens": 3200,

#   "savings_percent": 73.3,

#   "transforms": ["router:search:0.27"]

# }

```

Under the hood, [`headroom/transforms/content_router.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/content_router.py) routes the content to the appropriate compressor—such as [`headroom/transforms/kompress_compressor.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/kompress_compressor.py) for ML-based compression or [`headroom/transforms/search_compressor.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/search_compressor.py) for structured search results. Token counts are calculated using utilities in `headroom/tokenizers/`, and markers are generated via [`headroom/utils.py`](https://github.com/chopratejas/headroom/blob/main/headroom/utils.py) (`create_marker`, `create_tool_digest_marker`).

### headroom_retrieve

Fetch the original uncompressed content using the hash returned by the compress step:

```python
retrieve_req = {
    "tool": "headroom_retrieve",
    "parameters": {
        "hash": resp["hash"],
        # optional: "query": "specific search term"  # returns only matches

    }
}
retrieved = requests.post(BASE, json=retrieve_req).json()

# → {"original_content": "... full uncompressed text …", "source": "local"}

```

If the proxy is also running and the local store misses, `headroom_retrieve` automatically falls back to the proxy store.

### headroom_stats

Query session-wide metrics without parameters:

```python
stats_req = {"tool": "headroom_stats", "parameters": {}}
stats = requests.post(BASE, json=stats_req).json()

```

The response includes:

- **compressions**: Total number of compressions performed
- **retrievals**: Number of successful retrievals
- **tokens_saved**: Cumulative tokens reduced
- **savings_percent**: Overall compression efficiency
- **estimated_cost_saved_usd**: Approximate API cost avoided
- **recent_events**: Timeline of compression activity
- **proxy**: Metrics from the proxy (if connected)

## Key Implementation Files

Understanding the source architecture helps debug custom deployments:

| File | Role |
|------|------|
| [`wiki/mcp.md`](https://github.com/chopratejas/headroom/blob/main/wiki/mcp.md) | Official user-facing documentation for the MCP architecture and CLI commands. |
| [`headroom/cli.py`](https://github.com/chopratejas/headroom/blob/main/headroom/cli.py) | Implements `headroom mcp install`, `serve`, `status`, and `uninstall` commands. |
| [`headroom/mcp_server.py`](https://github.com/chopratejas/headroom/blob/main/headroom/mcp_server.py) | HTTP server exposing the three JSON-RPC endpoints and managing the local compression store. |
| [`headroom/utils.py`](https://github.com/chopratejas/headroom/blob/main/headroom/utils.py) | Core utilities for marker generation (`create_marker`, `create_tool_digest_marker`) used by the compression pipeline. |
| [`headroom/transforms/content_router.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/content_router.py) | Decision logic routing content to the appropriate compressor (search, log, text, etc.). |
| [`headroom/transforms/kompress_compressor.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/kompress_compressor.py) | ML-based compressor for general text payloads. |
| [`headroom/transforms/search_compressor.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/search_compressor.py) | Specialized compressor for large grep or search results. |
| `headroom/tokenizers/*` | Token-counting helpers (e.g., [`tiktoken_counter.py`](https://github.com/chopratejas/headroom/blob/main/tiktoken_counter.py), [`mistral.py`](https://github.com/chopratejas/headroom/blob/main/mistral.py)) calculating `original_tokens` and `compressed_tokens`. |

## Summary

- **Install** the tools via `pip install "headroom-ai[mcp]"` or include `[proxy]` for the full stack.
- **Register** the server with `headroom mcp install` to configure Claude Code or compatible assistants.
- **Launch** the server using `headroom mcp serve` to expose endpoints at `http://127.0.0.1:8787`.
- **Compress** payloads with `headroom_compress` to receive a hash, token counts, and routing metadata.
- **Retrieve** data with `headroom_retrieve` using the stored hash, with optional query filtering.
- **Monitor** usage and cost savings through `headroom_stats`, which aggregates metrics from both the local store and proxy.

## Frequently Asked Questions

### Do I need to run the Headroom proxy to use the MCP tools?

No. The three MCP tools operate independently of the proxy. However, if the proxy is running when you call `headroom_retrieve`, the tool will automatically fall back to the proxy store if the hash is not found in the local MCP cache.

### How long does compressed data persist?

The local compression store maintains entries for approximately one hour (TTL ≈ 1 hour) before automatic expiration. This prevents unbounded memory growth while allowing sufficient time for multi-turn LLM conversations to retrieve context.

### What determines which compression algorithm runs?

The `headroom_compress` endpoint uses [`headroom/transforms/content_router.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/content_router.py) to analyze incoming content and select the optimal pipeline. For example, large grep results route to [`headroom/transforms/search_compressor.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/search_compressor.py), while general text routes to [`headroom/transforms/kompress_compressor.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/kompress_compressor.py). The `transforms` field in the response (e.g., `["router:search:0.27"]`) indicates which path was taken.

### Can I retrieve a subset of the compressed content?

Yes. When calling `headroom_retrieve`, include an optional `query` parameter in the JSON payload. The server will return only segments of the original content matching your query, reducing bandwidth without requiring you to fetch and filter the entire payload locally.