What Is CCR and How Does It Enable Reversible Compression in Headroom?
CCR (Compress‑Cache‑Retrieve) is Headroom’s architectural layer that stores original payloads in an LRU cache while replacing them with short hash markers, allowing aggressive token compression without permanent information loss.
The CCR system in the open-source chopratejas/headroom repository solves the fundamental trade-off between reducing token costs and preserving data fidelity. Unlike traditional compression that discards information to save space, CCR makes compression reversible by maintaining a complete backup of every transformed payload. This approach enables the LLM to access full original data on demand by referencing lightweight placeholders.
How CCR Works: The Four-Phase Architecture
CCR operates through four coordinated components that work transparently across the request lifecycle.
Compression Store with LRU Caching
When a transform like SmartCrusher processes large tool outputs—such as JSON arrays or log files—the Compression Store captures the full original payload. According to the source code in crates/headroom-core/src/ccr/mod.rs, the system generates a 24-character BLAKE3 hash via compute_key and produces a marker using marker_for. This marker, formatted as <<ccr:HASH>>, replaces the bulky content in the prompt sent to the LLM, while the original data lives in an LRU cache.
Tool Injection
The proxy automatically injects a headroom_retrieve tool definition into the LLM’s available tool list. As documented in wiki/ccr.md, this tool schema allows the model to request the original data when the compressed summary proves insufficient. The CCR marker in the compressed output signals to the LLM that retrieval is possible, creating a seamless bridge between compressed context and full detail.
Response Handler
After the LLM invokes headroom_retrieve with a specific hash, the CCR response handler intercepts the call. Implemented in the architecture described in wiki/ccr.md, this handler looks up the hash in the cache and returns either the exact original payload or a BM25-search result from the stored content. This process happens automatically without exposing cache mechanics to the end client, maintaining a clean conversation flow.
Context Tracker
Across multiple conversation turns, the Context Tracker maintains awareness of which hashes were created and their semantic relevance. As noted in wiki/ccr.md, this component can proactively expand compressed contexts when later user queries reference previous summaries, ensuring the LLM receives necessary details without explicit user requests for retrieval.
Implementation Examples
Generating CCR Markers in Rust
The core hashing logic resides in crates/headroom-core/src/ccr/mod.rs. The compute_key function creates the BLAKE3 hash, while marker_for formats it for insertion:
use headroom_core::ccr::{compute_key, marker_for};
let payload = r#"[
{"ts":1,"cpu":45},
{"ts":2,"cpu":45},
// … many more items …
]"#;
let hash = compute_key(payload.as_bytes());
let marker = marker_for(&hash);
println!("Compressed block will contain: {}", marker);
// Output: <<ccr:1a2b3c4d5e6f7g8h9i0j1k>>
This marker replaces thousands of tokens in the LLM context window while preserving a lookup key for the original data.
Using headroom_retrieve in Python
When working with the Headroom Python client, the headroom_retrieve tool handles automatic expansion. The tool definition specified in wiki/ccr.md enables the following flow:
from headroom import HeadroomClient, OpenAIProvider
from openai import OpenAI
# Wrap the original OpenAI client
base = OpenAI(api_key="...")
client = HeadroomClient(original_client=base, provider=OpenAIProvider())
# Request with compression enabled
resp = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Give me all logs from the last hour"}],
headroom_mode="optimize",
)
# The LLM may emit a tool call like:
# {
# "tool_calls": [{
# "id": "call_1",
# "function": {
# "name": "headroom_retrieve",
# "arguments": "{\"hash\":\"<<ccr:abc123def456>>\"}"
# }
# }]
# }
# Proxy automatically handles retrieval and returns final answer
print(resp.choices[0].message.content)
Proactive Context Expansion
The Context Tracker can trigger retrievals automatically without explicit tool calls from the LLM. As implemented in the reference architecture:
# Follow-up query referencing previous compressed data
follow_up = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "assistant", "content": "Here are the top 5 error spikes."},
{"role": "user", "content": "Show me the full logs for the biggest spike."},
],
headroom_mode="optimize",
)
# Context Tracker identifies hash "abc123" from prior turn,
# automatically injects retrieval, and returns full logs
print(follow_up.choices[0].message.content)
Key Source Files and Components
Understanding CCR requires familiarity with these specific modules:
crates/headroom-core/src/ccr/mod.rs– Core utilities includingcompute_key,marker_for, and theCcrStoretrait for hash generation and marker formatting.wiki/ccr.md– Human-readable specification defining CCR phases, theheadroom_retrievetool schema, and proactive expansion protocols.ccr/tool_injection.py– Implements the logic that injects theheadroom_retrievetool into the LLM’s tool list at request time.ccr/response_handler.py– Detects CCR tool calls, performs cache lookups, and returns original payloads or BM25 search results.ccr/context_tracker.py– Tracks hash lifecycles across conversation turns and drives proactive expansion of relevant compressed contexts.wiki/ARCHITECTURE.md– High-level diagram linking the compression store, tool injection, and response handler components.
Summary
- CCR (Compress‑Cache‑Retrieve) eliminates the information-loss trade-off in LLM context compression by storing originals in an LRU cache while sending short hash markers to the model.
- Four components—Compression Store, Tool Injection, Response Handler, and Context Tracker—work together to make compression reversible across conversation turns.
- BLAKE3 hashing in
crates/headroom-core/src/ccr/mod.rsgenerates secure 24-character keys that replace verbose payloads in the context window. - Automatic retrieval via
headroom_retrieveallows the LLM to access full original data on demand without client-side complexity. - Proactive expansion ensures relevant compressed contexts expand automatically when follow-up queries reference previous summaries.
Frequently Asked Questions
What does CCR stand for in Headroom?
CCR stands for Compress‑Cache‑Retrieve. It describes the three-stage workflow where tool outputs are compressed and replaced with hash markers, cached in an LRU store, and then retrieved on demand when the LLM needs the full original data.
How does CCR differ from standard compression methods?
Standard compression permanently reduces data size by removing or encoding information, which risks losing details the LLM might need later. CCR is reversible because it stores the complete original payload in crates/headroom-core/src/ccr/mod.rs while only transmitting a lightweight <<ccr:HASH>> marker, allowing exact reconstruction of the source material at any time.
What hash algorithm does Headroom CCR use?
Headroom CCR uses BLAKE3 to generate 24-character hash keys. According to the implementation in crates/headroom-core/src/ccr/mod.rs, the compute_key function processes the original payload bytes to create these unique identifiers that serve as cache lookup keys.
Can CCR work with any LLM provider?
Yes. CCR operates as a proxy layer that injects the headroom_retrieve tool definition into the standard tool-calling interface. As shown in wiki/ccr.md, this works with any provider supporting tool use, including OpenAI, Anthropic, and open-source models via compatible APIs.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →