deep-dive

Headroom CCR Reversible Compression: How the headroom_retrieve Tool Restores LLM Context

June 5, 2026 chopratejas/headroom ↗

CCR (Compress‑Cache‑Retrieve) is Headroom’s reversible compression mechanism that stores original content in a deterministic, in‑memory cache keyed by a hash, replaces it with a lightweight marker in the LLM prompt, and restores the full payload on demand when the model emits a headroom_retrieve tool call intercepted by the Headroom proxy.

The chopratejas/headroom repository implements an aggressive context‑window optimization strategy called CCR (reversible compression) that guarantees zero data loss. When internal transformers such as SmartCrusher or IntelligentContext drop content to save tokens, the system does not discard the text permanently; instead, it preserves the full payload in a reversible cache and surfaces a retrieval handle to the model. Understanding this workflow is essential for developers integrating Headroom into LLM pipelines that require lossless compression with transparent retrieval.

What Is CCR (Reversible Compression)?

CCR stands for Compress‑Cache‑Retrieve. It is the architectural mechanism, described in wiki/ccr.md, that makes Headroom’s aggressive text compression fully reversible. Instead of permanently erasing content to stay inside the model’s context window, the system performs two atomic operations: it stores the original text in a local cache and inserts a compact placeholder into the prompt.

The Compress and Cache Phase

When a transformer decides that a piece of content can be dropped, the full‑size payload is written to a local in‑memory CCR cache and keyed by a deterministic hash known as the CCR key. This happens before the prompt ever reaches the LLM, ensuring the original data remains available for the lifetime of the session.

The Marker Insertion Phase

The dropped content is replaced by a small marker containing the CCR key. As implemented in headroom/transforms/compression_summary.py, the system builds a summary that tells the LLM the original data exists and can be fetched on demand. The marker effectively acts as a deferred‑load pointer inside the prompt, giving the model the option to retrieve the full text only if its reasoning requires it.

How the headroom_retrieve Tool Works

The headroom_retrieve tool is automatically injected into the model’s toolset by the Headroom proxy, with its schema declared in headroom/tools.json. Runtime interception and fulfillment are handled by the CCRResponseHandler class in headroom/ccr/response_handler.py, which means the client application never sees extra HTTP traffic.

Tool Schema and Parameters

The tool definition follows a strict JSON schema that accepts a required hash and an optional query parameter:

{
  "name": "headroom_retrieve",
  "description": "Retrieve the original uncompressed content that was stored in CCR.",
  "parameters": {
    "type": "object",
    "properties": {
      "hash": { "type": "string", "description": "CCR key of the stored content" },
      "query": { "type": "string", "description": "Optional search term for BM25‑style lookup", "default": "" }
    },
    "required": ["hash"]
  }
}

When the LLM emits a tool‑use block such as:

{
  "type": "tool_use",
  "id": "toolu_1",
  "name": "headroom_retrieve",
  "input": { "hash": "ccr_abc123", "query": "error handling" }
}

the proxy intercepts the call locally.

CCRResponseHandler Execution Flow

The CCRResponseHandler, located at headroom/ccr/response_handler.py, watches the LLM output for a headroom_retrieve tool use block. It looks up the original payload in the CCR store using the supplied hash, and if a query is provided, it performs a BM25‑style lookup to return only the matching lines or sections. The handler then packages the result as a tool_response:

{
  "type": "tool_response",
  "tool_use_id": "toolu_1",
  "content": { "type": "object", "content": "...original text..." }
}

This response is re‑injected directly into the conversation flow, allowing the model to continue generating an answer with the full or filtered original content. Because the proxy resolves the call internally, the operation is completely transparent to the client.

Feedback into Compression Policy

If the LLM accesses a stored payload via headroom_retrieve, the system records that the dropped content was important. This feedback can be fed back into the compression policy, enabling Headroom to refine future decisions about what should be compressed versus what should be retained.

CCR Round-Trip Code Examples

The following snippets illustrate typical usage patterns from a Python client. These examples rely on the Headroom SDK and mirror the behavior exercised in tests/test_transforms/test_smart_crusher_ccr_roundtrip.py.

Enabling CCR in a Session

from headroom import HeadroomClient

client = HeadroomClient()

# Enable CCR (default is True)

client.configure(ccr_enabled=True)

# Run a prompt – the proxy will compress aggressively and store originals.

response = client.chat("Summarize the following log file:", files=["/path/to/big.log"])

The response contains a compression summary and a CCR marker such as {{CCR:ccr_5f4a2b}}.

Retrieving Original Content with headroom_retrieve

In normal operation, the LLM triggers retrieval automatically; the client does not need to handle it manually. For debugging or external tooling, you can call the tool directly:


# Manual retrieval – useful for debugging or for external tools

original = client.headroom_retrieve(hash="ccr_5f4a2b")
print(original)   # prints the full, uncompressed log content

Querying Stored Content with BM25-Style Search

CCR supports filtered retrieval when you only need matching sections:

hits = client.headroom_retrieve(hash="ccr_5f4a2b", query="authentication failure")
print(hits)   # returns only the matching lines/sections

End-to-End Roundtrip Test

The test suite validates that compression and retrieval produce byte‑for‑byte identical results:

def roundtrip():
    # Compress a large JSON blob

    result = client.compress({"data": "..." * 1000})
    # The result includes a CCR key

    key = result.ccr_key

    # Retrieve the original via the tool

    recovered = client.headroom_retrieve(hash=key)

    assert recovered == {"data": "..." * 1000}

This pattern is the same one verified in tests/test_transforms/test_smart_crusher_ccr_roundtrip.py.

Summary

CCR (Compress‑Cache‑Retrieve) is Headroom’s reversible compression strategy that preserves original content in an in‑memory cache keyed by a deterministic hash.
The system inserts a lightweight CCR marker into the LLM prompt, as summarized by headroom/transforms/compression_summary.py.
headroom_retrieve is a first‑class tool, declared in headroom/tools.json and handled by headroom/ccr/response_handler.py, that restores compressed content on demand.
The proxy intercepts headroom_retrieve calls locally, supports optional BM25‑style queries, and re‑injects the result transparently into the conversation.
End‑to‑end tests in tests/test_transforms/test_smart_crusher_ccr_roundtrip.py guarantee that the compression and retrieval pipeline is lossless.

Frequently Asked Questions

How does CCR differ from standard lossy compression?

Standard lossy compression permanently discards data to reduce token count, whereas CCR reversible compression stores the full original payload in a local cache before replacing it with a marker. The headroom_retrieve tool can restore the exact original text at any time, giving Headroom aggressive token savings while guaranteeing zero loss.

What happens if the LLM never calls headroom_retrieve?

If the model’s reasoning does not require the omitted content, no retrieval occurs and the conversation continues with the compressed prompt. The original payload remains available in the CCR store for the duration of the session without adding network overhead.

Can headroom_retrieve return a subset of the stored content instead of the full payload?

Yes. The tool accepts an optional query parameter. When supplied, the CCRResponseHandler performs a BM25‑style lookup against the stored text and returns only the matching lines or sections rather than the entire document.

Where is the CCR store located?

According to the Headroom source code, the CCR store is maintained as a local in‑memory cache managed by the proxy. Lookups happen internally when headroom_retrieve is invoked, so no external HTTP request is required to recover compressed content.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how chopratejas/headroom works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →