Understanding the CCR (Compress-Cache-Retrieve) Mechanism in Headroom
The CCR mechanism in Headroom enables reversible compression of large tool outputs by replacing payloads with hash-based markers, caching the original data in a Rust-backed store, and allowing LLMs to retrieve full content on demand via the headroom_retrieve tool.
Headroom is an open-source framework designed to optimize token usage by compressing large tool-output payloads before they reach the LLM. Because the model occasionally needs access to the original uncompressed data, the repository chopratejas/headroom implements a three-stage CCR (Compress-Cache-Retrieve) mechanism that preserves data integrity while minimizing context window consumption. This reversible pipeline seamlessly integrates Python compression logic with a high-performance Rust caching layer to ensure deterministic data recovery.
How the CCR Pipeline Works
The CCR mechanism consists of three tightly coupled stages that operate across the Python and Rust codebase:
| Stage | Description | Location |
|---|---|---|
| Compress | Compressors replace long item lists with a short marker containing a hash of the original data. | headroom/transforms/*.py (e.g., smart_crusher.py, kompress_compressor.py) |
| Cache | The original payload is stored in an in-memory CCR store keyed by a 24-character hex hash. | Rust src/ccr module (exposed via FFI) |
| Retrieve | The LLM calls headroom_retrieve to fetch the original content using the hash. |
headroom/ccr/tool_injection.py and headroom/ccr/response_handler.rs |
Compression and Hash Generation
During the Compress phase, specialized compressors like SmartCrusher or LogCompressor generate markers that include a truncated SHA-256 hash (24 hex characters) providing 96-bit collision resistance. An example marker looks like:
[100 items compressed to 10. Retrieve more: hash=1a2b3c4d5e6f7a8b9c0d1e2f]
The Rust-Backed CCR Store
The Cache stage occurs in the Rust src/ccr module, which maintains an in-memory store exposed to Python through FFI. The hash serves as the immutable key, ensuring that once data enters the cache, it can be retrieved deterministically throughout the session.
Detecting Compression Markers
Before the LLM can retrieve data, the system must identify which messages contain compressed content. The CCRToolInjector class in headroom/ccr/tool_injection.py implements scan_for_markers() to walk through message content—including plain strings, Anthropic content blocks, and Google parts—extracting hashes via regular expressions:
_marker_patterns = [
re.compile(r"\[(\d+) \w+ compressed to (\d+)\. Retrieve more: hash=([a-f0-9]{24})\]"),
re.compile(r"\[(\d+) \w+ compressed\. hash=([a-f0-9]{24})\]"),
re.compile(r"\[.*?compressed.*?hash=([a-f0-9]{24})\]", re.IGNORECASE),
]
Discovered hashes are stored in _detected_hashes, enabling the system to set has_compressed_content for the current turn.
Injecting the Retrieval Tool
When compression markers are detected, or when a session has previously used CCR (sticky mode), CCRToolInjector.inject_tool_definition() automatically adds the headroom_retrieve tool to the request. The injection supports provider-specific schemas for OpenAI, Anthropic, and Google:
def create_ccr_tool_definition(provider="anthropic") -> dict:
return {
"type": "function",
"function": {
"name": "headroom_retrieve",
"description": "Retrieve original uncompressed content that was compressed to save tokens.",
"parameters": {
"type": "object",
"properties": {
"hash": {"type": "string", "description": "Hash key from the compression marker"},
"query": {"type": "string", "description": "Optional search query"},
},
"required": ["hash"],
},
},
}
Sticky Session Management
To prevent prompt-cache thrashing from fluctuating tool lists, Headroom implements sticky CCR behavior. Once session_has_done_ccr is set via apply_session_sticky_ccr_tool(), the retrieval tool remains injected for the entire session, even if subsequent turns lack fresh markers. This optimization is documented in REALIGNMENT/04-phase-B-live-zone.md.
Handling Retrieval Requests
When the LLM invokes headroom_retrieve, the server processes the call through parse_tool_call() in headroom/ccr/tool_injection.py. The function enforces strict validation to prevent hash-spoofing attacks:
if hash_key is not None:
if not isinstance(hash_key, str) or len(hash_key) != 24:
return None, None
if not all(c in "0123456789abcdef" for c in hash_key.lower()):
return None, None
After validation, the request routes to the Rust response handler (headroom/ccr/response_handler.rs), which retrieves the original payload from the CCR store, applies optional substring filtering based on the query parameter, and returns the full data as a tool result.
Security and Robustness Guarantees
The CCR mechanism incorporates multiple defensive layers:
- Hash format validation: Only 24-character hexadecimal strings are accepted, ensuring that only hashes generated by the compressor can resolve to cached data.
- Graceful fallback: If the hash is missing or malformed,
parse_tool_call()returns(None, None), allowing the request to proceed as a standard tool call without CCR side effects. - Corruption recovery: Tests in
tests/test_corrupt_golden_bytes_recovery.pyverify that damaged CCR definitions trigger regeneration rather than raisingRuntimeError, maintaining system stability.
End-to-End Implementation Example
The complete roundtrip—from compression to retrieval—is validated in tests/test_transforms/test_smart_crusher_ccr_roundtrip.py. Below is a simplified implementation pattern:
from headroom.ccr.tool_injection import CCRToolInjector, parse_tool_call
import json
# 1. Scan for markers and inject the retrieval tool
injector = CCRToolInjector(provider="anthropic", inject_tool=True)
injector.scan_for_markers(messages)
updated_msgs, updated_tools, was_injected = injector.process_request(
messages,
tools=existing_tools,
session_has_done_ccr=False,
)
# 2. Simulate LLM calling headroom_retrieve
tool_call = {
"function": {
"name": "headroom_retrieve",
"arguments": json.dumps({"hash": "1a2b3c4d5e6f7a8b9c0d1e2f", "query": "error"})
}
}
# 3. Parse and validate the hash
hash_key, query = parse_tool_call(tool_call, provider="anthropic")
# hash_key is then used to fetch from the Rust CCR store
Summary
- The CCR mechanism enables reversible compression by combining Python compressors with a Rust in-memory cache.
- Compression markers contain 24-character truncated SHA-256 hashes that act as immutable keys in the CCR store.
- The
CCRToolInjectorclass manages automatic tool injection and sticky session handling to optimize prompt caching. - Strict hash validation (length 24, hex-only) prevents spoofing attacks and ensures data integrity.
- The retrieval endpoint in
headroom/ccr/response_handler.rssupports optional query filtering to return only relevant subsets of the original payload.
Frequently Asked Questions
What makes the CCR mechanism "reversible"?
The mechanism is reversible because the original payload is never discarded. Instead, it is cached in the Rust-side CCR store using a cryptographic hash as the key. When the LLM encounters a compression marker, it can call headroom_retrieve with the hash to access the exact original data, ensuring no information is permanently lost during compression.
Why does Headroom use a 24-character hex hash instead of the full SHA-256?
Headroom uses a truncated SHA-256 hash (24 hex characters, representing 96 bits) to balance collision resistance with marker brevity. This provides sufficient entropy to avoid accidental collisions in practical workloads while keeping the compression markers short enough to minimize token consumption in the LLM context window.
How does the sticky CCR mode improve performance?
Sticky CCR mode, controlled by the session_has_done_ccr flag and apply_session_sticky_ccr_tool(), keeps the headroom_retrieve tool definition persistently injected after the first use. This prevents cache thrashing that would occur if the tool list toggled on and off between turns, stabilizing the prompt context for the LLM provider's caching mechanisms.
What happens if a corrupted CCR tool definition is detected?
According to tests/test_corrupt_golden_bytes_recovery.py, the system detects corrupted golden bytes and regenerates a fresh tool definition rather than crashing with a RuntimeError. This ensures that temporary data corruption or version mismatches do not interrupt the compression-retrieval workflow.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →