How Headroom's CCR (Compress-Cache-Retrieve) Architecture Enables Reversible Compression
Headroom's CCR architecture compresses large tool outputs into hash-based markers, caches the original data in a Rust-based store, and exposes a retrieval tool that allows LLMs to fetch uncompressed content on demand using a 24-character hex hash.
The CCR (Compress-Cache-Retrieve) mechanism in the chopratejas/headroom repository provides a reversible compression pipeline that reduces token usage while preserving access to original data. This three-stage architecture allows the system to replace verbose tool outputs with compact markers, store the uncompressed payloads securely, and retrieve them via a specialized tool call when the LLM requires full context.
The Three Stages of CCR
The CCR pipeline operates through tightly coupled stages that span both Python and Rust components.
Compress Stage
During compression, specialized modules under headroom/transforms/ (such as smart_crusher.py, kompress_compressor.py, log_compressor.py, and search_compressor.py) replace long item lists with concise markers. Each marker contains a 24-character hex hash derived from a truncated SHA-256 of the original payload, providing 96-bit collision resistance while keeping the marker short.
Example marker format:
[100 items compressed to 10. Retrieve more: hash=1a2b3c4d5e6f7a8b9c0d1e2f]
Cache Stage
The original uncompressed payload is stored in an in-memory CCR store implemented in the Rust src/ccr module. The Rust-based cache maintains the mapping between the 24-character hash key and the full payload, exposed to Python via FFI. This separation ensures high-performance storage while the Python side handles marker generation.
Retrieve Stage
When the LLM needs the full data, it calls the headroom_retrieve tool. The request flows through headroom/ccr/response_handler.rs, which looks up the hash in the CCR store, optionally applies query filtering, and returns the original content. This round-trip is validated in tests/test_transforms/test_smart_crusher_ccr_roundtrip.py.
Detecting Compression Markers
The CCRToolInjector class in headroom/ccr/tool_injection.py scans messages for compression markers using regular expressions that match supported formats:
_marker_patterns = [
re.compile(r"\[(\d+) \w+ compressed to (\d+)\. Retrieve more: hash=([a-f0-9]{24})\]"),
re.compile(r"\[(\d+) \w+ compressed\. hash=([a-f0-9]{24})\]"),
re.compile(r"\[.*?compressed.*?hash=([a-f0-9]{24})\]", re.IGNORECASE),
]
The scan_for_markers() method walks through message content (including Anthropic content blocks and Google parts) and populates _detected_hashes. This enables the injector to determine has_compressed_content for the current turn.
Injecting the Retrieval Tool
When compression is detected, CCRToolInjector.inject_tool_definition() adds the headroom_retrieve tool to the request's tools array. The tool definition is provider-specific (OpenAI, Anthropic, or Google):
def create_ccr_tool_definition(provider="anthropic") -> dict:
return {
"type": "function",
"function": {
"name": "headroom_retrieve",
"description": "Retrieve original uncompressed content that was compressed to save tokens.",
"parameters": {
"type": "object",
"properties": {
"hash": {"type": "string", "description": "Hash key from the compression marker"},
"query": {"type": "string", "description": "Optional search query"},
},
"required": ["hash"],
},
},
}
The injector can also embed instructions via create_system_instructions() to ensure the LLM understands when and how to invoke the tool.
Handling Retrieval Requests
When the LLM calls headroom_retrieve, the system validates the request through parse_tool_call() in headroom/ccr/tool_injection.py:
if hash_key is not None:
if not isinstance(hash_key, str) or len(hash_key) != 24:
return None, None
if not all(c in "0123456789abcdef" for c in hash_key.lower()):
return None, None
Valid requests route to the Rust response handler, which retrieves the payload from the CCR store and applies optional substring filtering based on the query parameter.
Session-Level Sticky CCR Behavior
Headroom implements sticky CCR to prevent prompt-cache thrashing. Once a session performs a CCR round-trip (tracked via session_has_done_ccr), the apply_session_sticky_ccr_tool() method ensures the retrieval tool remains injected for the remainder of the session, even when subsequent turns contain no new compression markers. This design decision, documented in REALIGNMENT/04-phase-B-live-zone.md, maintains cache stability by avoiding tool list fluctuations.
Security and Robustness Features
The CCR architecture includes multiple safeguards:
- Strict hash validation: Only 24-character hexadecimal strings are accepted, preventing hash spoofing attacks
- Graceful degradation: Malformed hashes return
(None, None)and fail silently without disrupting the conversation - Corruption recovery:
tests/test_corrupt_golden_bytes_recovery.pyensures that damaged CCR definitions regenerate automatically rather than raisingRuntimeError
Summary
- Headroom's CCR architecture consists of three stages: Compress (Python transforms), Cache (Rust in-memory store), and Retrieve (tool-based recovery).
- The system uses 24-character hex hashes generated from truncated SHA-256 to create collision-resistant markers that replace large payloads.
CCRToolInjectorautomatically detects compression markers and injects theheadroom_retrievetool definition into provider-specific requests.- Session sticky mode maintains the retrieval tool across conversation turns to prevent cache thrashing.
- Hash validation ensures only genuine markers created by the compressor can access cached data, with robust fallback handling for malformed inputs.
Frequently Asked Questions
How does Headroom prevent hash collisions in the CCR store?
The CCR architecture uses a 24-character hex string derived from a truncated SHA-256 hash of the original payload, providing 96-bit collision resistance. This length balances marker brevity against collision probability, making accidental collisions statistically negligible while keeping token overhead minimal.
What happens if the LLM calls headroom_retrieve with an invalid hash?
The parse_tool_call() function in headroom/ccr/tool_injection.py validates that hashes are exactly 24 characters and contain only hexadecimal characters. If validation fails, the function returns (None, None), and the request is treated as a normal tool call without triggering CCR retrieval side effects, preventing crashes or data leakage.
Can the CCR retrieval tool filter results before returning data?
Yes. The headroom_retrieve tool accepts an optional query parameter. When provided, the Rust response handler in headroom/ccr/response_handler.rs applies substring filtering to the cached payload before returning results, allowing the LLM to retrieve only relevant portions of large compressed datasets.
Why does Headroom use Rust for the CCR store instead of Python?
The Rust implementation in src/ccr provides high-performance, in-memory caching with safe concurrency guarantees. By exposing this functionality via FFI, Headroom maintains Python's flexibility for transform logic while leveraging Rust's zero-cost abstractions and memory safety for the critical path of storing and retrieving potentially large payloads.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →