# Headroom CCR Reversible Compression: How the headroom_retrieve Tool Restores LLM Context

> Learn how Headroom CCR reversible compression restores LLM context with the headroom_retrieve tool. It efficiently replaces content with markers and retrieves it on demand.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: deep-dive
- Published: 2026-06-05

---

**CCR (Compress‑Cache‑Retrieve) is Headroom’s reversible compression mechanism that stores original content in a deterministic, in‑memory cache keyed by a hash, replaces it with a lightweight marker in the LLM prompt, and restores the full payload on demand when the model emits a `headroom_retrieve` tool call intercepted by the Headroom proxy.**

The `chopratejas/headroom` repository implements an aggressive context‑window optimization strategy called **CCR (reversible compression)** that guarantees zero data loss. When internal transformers such as SmartCrusher or IntelligentContext drop content to save tokens, the system does not discard the text permanently; instead, it preserves the full payload in a reversible cache and surfaces a retrieval handle to the model. Understanding this workflow is essential for developers integrating Headroom into LLM pipelines that require lossless compression with transparent retrieval.

## What Is CCR (Reversible Compression)?

CCR stands for **Compress‑Cache‑Retrieve**. It is the architectural mechanism, described in [`wiki/ccr.md`](https://github.com/chopratejas/headroom/blob/main/wiki/ccr.md), that makes Headroom’s aggressive text compression fully reversible. Instead of permanently erasing content to stay inside the model’s context window, the system performs two atomic operations: it stores the original text in a local cache and inserts a compact placeholder into the prompt.

### The Compress and Cache Phase

When a transformer decides that a piece of content can be dropped, the full‑size payload is written to a **local in‑memory CCR cache** and keyed by a deterministic hash known as the **CCR key**. This happens before the prompt ever reaches the LLM, ensuring the original data remains available for the lifetime of the session.

### The Marker Insertion Phase

The dropped content is replaced by a small marker containing the CCR key. As implemented in [`headroom/transforms/compression_summary.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/compression_summary.py), the system builds a summary that tells the LLM the original data exists and can be fetched on demand. The marker effectively acts as a deferred‑load pointer inside the prompt, giving the model the option to retrieve the full text only if its reasoning requires it.

## How the headroom_retrieve Tool Works

The `headroom_retrieve` tool is automatically injected into the model’s toolset by the Headroom proxy, with its schema declared in [`headroom/tools.json`](https://github.com/chopratejas/headroom/blob/main/headroom/tools.json). Runtime interception and fulfillment are handled by the `CCRResponseHandler` class in [`headroom/ccr/response_handler.py`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/response_handler.py), which means the client application never sees extra HTTP traffic.

### Tool Schema and Parameters

The tool definition follows a strict JSON schema that accepts a required `hash` and an optional `query` parameter:

```json
{
  "name": "headroom_retrieve",
  "description": "Retrieve the original uncompressed content that was stored in CCR.",
  "parameters": {
    "type": "object",
    "properties": {
      "hash": { "type": "string", "description": "CCR key of the stored content" },
      "query": { "type": "string", "description": "Optional search term for BM25‑style lookup", "default": "" }
    },
    "required": ["hash"]
  }
}

```

When the LLM emits a tool‑use block such as:

```json
{
  "type": "tool_use",
  "id": "toolu_1",
  "name": "headroom_retrieve",
  "input": { "hash": "ccr_abc123", "query": "error handling" }
}

```

the proxy intercepts the call locally.

### CCRResponseHandler Execution Flow

The `CCRResponseHandler`, located at [`headroom/ccr/response_handler.py`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/response_handler.py), watches the LLM output for a `headroom_retrieve` tool use block. It looks up the original payload in the CCR store using the supplied `hash`, and if a `query` is provided, it performs a **BM25‑style lookup** to return only the matching lines or sections. The handler then packages the result as a `tool_response`:

```json
{
  "type": "tool_response",
  "tool_use_id": "toolu_1",
  "content": { "type": "object", "content": "...original text..." }
}

```

This response is re‑injected directly into the conversation flow, allowing the model to continue generating an answer with the full or filtered original content. Because the proxy resolves the call internally, the operation is completely transparent to the client.

### Feedback into Compression Policy

If the LLM accesses a stored payload via `headroom_retrieve`, the system records that the dropped content was important. This feedback can be fed back into the compression policy, enabling Headroom to refine future decisions about what should be compressed versus what should be retained.

## CCR Round-Trip Code Examples

The following snippets illustrate typical usage patterns from a Python client. These examples rely on the Headroom SDK and mirror the behavior exercised in [`tests/test_transforms/test_smart_crusher_ccr_roundtrip.py`](https://github.com/chopratejas/headroom/blob/main/tests/test_transforms/test_smart_crusher_ccr_roundtrip.py).

### Enabling CCR in a Session

```python
from headroom import HeadroomClient

client = HeadroomClient()

# Enable CCR (default is True)

client.configure(ccr_enabled=True)

# Run a prompt – the proxy will compress aggressively and store originals.

response = client.chat("Summarize the following log file:", files=["/path/to/big.log"])

```

The `response` contains a compression summary and a CCR marker such as `{{CCR:ccr_5f4a2b}}`.

### Retrieving Original Content with headroom_retrieve

In normal operation, the LLM triggers retrieval automatically; the client does not need to handle it manually. For debugging or external tooling, you can call the tool directly:

```python

# Manual retrieval – useful for debugging or for external tools

original = client.headroom_retrieve(hash="ccr_5f4a2b")
print(original)   # prints the full, uncompressed log content

```

### Querying Stored Content with BM25-Style Search

CCR supports filtered retrieval when you only need matching sections:

```python
hits = client.headroom_retrieve(hash="ccr_5f4a2b", query="authentication failure")
print(hits)   # returns only the matching lines/sections

```

### End-to-End Roundtrip Test

The test suite validates that compression and retrieval produce byte‑for‑byte identical results:

```python
def roundtrip():
    # Compress a large JSON blob

    result = client.compress({"data": "..." * 1000})
    # The result includes a CCR key

    key = result.ccr_key

    # Retrieve the original via the tool

    recovered = client.headroom_retrieve(hash=key)

    assert recovered == {"data": "..." * 1000}

```

This pattern is the same one verified in [`tests/test_transforms/test_smart_crusher_ccr_roundtrip.py`](https://github.com/chopratejas/headroom/blob/main/tests/test_transforms/test_smart_crusher_ccr_roundtrip.py).

## Summary

- **CCR (Compress‑Cache‑Retrieve)** is Headroom’s reversible compression strategy that preserves original content in an in‑memory cache keyed by a deterministic hash.
- The system inserts a lightweight **CCR marker** into the LLM prompt, as summarized by [`headroom/transforms/compression_summary.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/compression_summary.py).
- **`headroom_retrieve`** is a first‑class tool, declared in [`headroom/tools.json`](https://github.com/chopratejas/headroom/blob/main/headroom/tools.json) and handled by [`headroom/ccr/response_handler.py`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/response_handler.py), that restores compressed content on demand.
- The proxy intercepts `headroom_retrieve` calls locally, supports optional **BM25‑style queries**, and re‑injects the result transparently into the conversation.
- End‑to‑end tests in [`tests/test_transforms/test_smart_crusher_ccr_roundtrip.py`](https://github.com/chopratejas/headroom/blob/main/tests/test_transforms/test_smart_crusher_ccr_roundtrip.py) guarantee that the compression and retrieval pipeline is lossless.

## Frequently Asked Questions

### How does CCR differ from standard lossy compression?

Standard lossy compression permanently discards data to reduce token count, whereas **CCR reversible compression** stores the full original payload in a local cache before replacing it with a marker. The `headroom_retrieve` tool can restore the exact original text at any time, giving Headroom aggressive token savings while guaranteeing zero loss.

### What happens if the LLM never calls headroom_retrieve?

If the model’s reasoning does not require the omitted content, no retrieval occurs and the conversation continues with the compressed prompt. The original payload remains available in the CCR store for the duration of the session without adding network overhead.

### Can headroom_retrieve return a subset of the stored content instead of the full payload?

Yes. The tool accepts an optional `query` parameter. When supplied, the `CCRResponseHandler` performs a BM25‑style lookup against the stored text and returns only the matching lines or sections rather than the entire document.

### Where is the CCR store located?

According to the Headroom source code, the CCR store is maintained as a **local in‑memory cache** managed by the proxy. Lookups happen internally when `headroom_retrieve` is invoked, so no external HTTP request is required to recover compressed content.