Headroom CCR Reversible Compression: How the headroom_retrieve Tool Restores LLM Context
CCR (Compress‑Cache‑Retrieve) is Headroom’s reversible compression mechanism that stores original content in a deterministic, in‑memory cache keyed by a hash, replaces it with a lightweight marker in the LLM prompt, and restores the full payload on demand when the model emits a headroom_retrieve tool call intercepted by the Headroom proxy.
The chopratejas/headroom repository implements an aggressive context‑window optimization strategy called CCR (reversible compression) that guarantees zero data loss. When internal transformers such as SmartCrusher or IntelligentContext drop content to save tokens, the system does not discard the text permanently; instead, it preserves the full payload in a reversible cache and surfaces a retrieval handle to the model. Understanding this workflow is essential for developers integrating Headroom into LLM pipelines that require lossless compression with transparent retrieval.
What Is CCR (Reversible Compression)?
CCR stands for Compress‑Cache‑Retrieve. It is the architectural mechanism, described in wiki/ccr.md, that makes Headroom’s aggressive text compression fully reversible. Instead of permanently erasing content to stay inside the model’s context window, the system performs two atomic operations: it stores the original text in a local cache and inserts a compact placeholder into the prompt.
The Compress and Cache Phase
When a transformer decides that a piece of content can be dropped, the full‑size payload is written to a local in‑memory CCR cache and keyed by a deterministic hash known as the CCR key. This happens before the prompt ever reaches the LLM, ensuring the original data remains available for the lifetime of the session.
The Marker Insertion Phase
The dropped content is replaced by a small marker containing the CCR key. As implemented in headroom/transforms/compression_summary.py, the system builds a summary that tells the LLM the original data exists and can be fetched on demand. The marker effectively acts as a deferred‑load pointer inside the prompt, giving the model the option to retrieve the full text only if its reasoning requires it.
How the headroom_retrieve Tool Works
The headroom_retrieve tool is automatically injected into the model’s toolset by the Headroom proxy, with its schema declared in headroom/tools.json. Runtime interception and fulfillment are handled by the CCRResponseHandler class in headroom/ccr/response_handler.py, which means the client application never sees extra HTTP traffic.
Tool Schema and Parameters
The tool definition follows a strict JSON schema that accepts a required hash and an optional query parameter:
{
"name": "headroom_retrieve",
"description": "Retrieve the original uncompressed content that was stored in CCR.",
"parameters": {
"type": "object",
"properties": {
"hash": { "type": "string", "description": "CCR key of the stored content" },
"query": { "type": "string", "description": "Optional search term for BM25‑style lookup", "default": "" }
},
"required": ["hash"]
}
}
When the LLM emits a tool‑use block such as:
{
"type": "tool_use",
"id": "toolu_1",
"name": "headroom_retrieve",
"input": { "hash": "ccr_abc123", "query": "error handling" }
}
the proxy intercepts the call locally.
CCRResponseHandler Execution Flow
The CCRResponseHandler, located at headroom/ccr/response_handler.py, watches the LLM output for a headroom_retrieve tool use block. It looks up the original payload in the CCR store using the supplied hash, and if a query is provided, it performs a BM25‑style lookup to return only the matching lines or sections. The handler then packages the result as a tool_response:
{
"type": "tool_response",
"tool_use_id": "toolu_1",
"content": { "type": "object", "content": "...original text..." }
}
This response is re‑injected directly into the conversation flow, allowing the model to continue generating an answer with the full or filtered original content. Because the proxy resolves the call internally, the operation is completely transparent to the client.
Feedback into Compression Policy
If the LLM accesses a stored payload via headroom_retrieve, the system records that the dropped content was important. This feedback can be fed back into the compression policy, enabling Headroom to refine future decisions about what should be compressed versus what should be retained.
CCR Round-Trip Code Examples
The following snippets illustrate typical usage patterns from a Python client. These examples rely on the Headroom SDK and mirror the behavior exercised in tests/test_transforms/test_smart_crusher_ccr_roundtrip.py.
Enabling CCR in a Session
from headroom import HeadroomClient
client = HeadroomClient()
# Enable CCR (default is True)
client.configure(ccr_enabled=True)
# Run a prompt – the proxy will compress aggressively and store originals.
response = client.chat("Summarize the following log file:", files=["/path/to/big.log"])
The response contains a compression summary and a CCR marker such as {{CCR:ccr_5f4a2b}}.
Retrieving Original Content with headroom_retrieve
In normal operation, the LLM triggers retrieval automatically; the client does not need to handle it manually. For debugging or external tooling, you can call the tool directly:
# Manual retrieval – useful for debugging or for external tools
original = client.headroom_retrieve(hash="ccr_5f4a2b")
print(original) # prints the full, uncompressed log content
Querying Stored Content with BM25-Style Search
CCR supports filtered retrieval when you only need matching sections:
hits = client.headroom_retrieve(hash="ccr_5f4a2b", query="authentication failure")
print(hits) # returns only the matching lines/sections
End-to-End Roundtrip Test
The test suite validates that compression and retrieval produce byte‑for‑byte identical results:
def roundtrip():
# Compress a large JSON blob
result = client.compress({"data": "..." * 1000})
# The result includes a CCR key
key = result.ccr_key
# Retrieve the original via the tool
recovered = client.headroom_retrieve(hash=key)
assert recovered == {"data": "..." * 1000}
This pattern is the same one verified in tests/test_transforms/test_smart_crusher_ccr_roundtrip.py.
Summary
- CCR (Compress‑Cache‑Retrieve) is Headroom’s reversible compression strategy that preserves original content in an in‑memory cache keyed by a deterministic hash.
- The system inserts a lightweight CCR marker into the LLM prompt, as summarized by
headroom/transforms/compression_summary.py. headroom_retrieveis a first‑class tool, declared inheadroom/tools.jsonand handled byheadroom/ccr/response_handler.py, that restores compressed content on demand.- The proxy intercepts
headroom_retrievecalls locally, supports optional BM25‑style queries, and re‑injects the result transparently into the conversation. - End‑to‑end tests in
tests/test_transforms/test_smart_crusher_ccr_roundtrip.pyguarantee that the compression and retrieval pipeline is lossless.
Frequently Asked Questions
How does CCR differ from standard lossy compression?
Standard lossy compression permanently discards data to reduce token count, whereas CCR reversible compression stores the full original payload in a local cache before replacing it with a marker. The headroom_retrieve tool can restore the exact original text at any time, giving Headroom aggressive token savings while guaranteeing zero loss.
What happens if the LLM never calls headroom_retrieve?
If the model’s reasoning does not require the omitted content, no retrieval occurs and the conversation continues with the compressed prompt. The original payload remains available in the CCR store for the duration of the session without adding network overhead.
Can headroom_retrieve return a subset of the stored content instead of the full payload?
Yes. The tool accepts an optional query parameter. When supplied, the CCRResponseHandler performs a BM25‑style lookup against the stored text and returns only the matching lines or sections rather than the entire document.
Where is the CCR store located?
According to the Headroom source code, the CCR store is maintained as a local in‑memory cache managed by the proxy. Lookups happen internally when headroom_retrieve is invoked, so no external HTTP request is required to recover compressed content.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →