How Headroom CCR Compression Works: The Compress-Cache-Retrieve Pipeline
Headroom’s CCR compression replaces large tool outputs with short hash-based markers, stores the originals in a thread-safe in-memory cache, and injects a headroom_retrieve tool so the LLM can fetch full data on demand.
Headroom CCR compression is a reversible compression layer in the chopratejas/headroom repository that aggressively shrinks large LLM tool outputs while eliminating information loss. Unlike one-way summarization, this Compress-Cache-Retrieve (CCR) workflow lets transformers like SmartCrusher and Kompress reduce payloads to a fraction of their original token count, then rebuild the full conversation context through on-demand retrieval.
The CCR Pipeline: From Compression to Retrieval
Headroom CCR compression operates through a six-stage pipeline implemented across headroom/ccr/ and headroom/cache/.
Compression and Hash Generation
When a transformer such as SmartCrusher or Kompress processes a payload, it first generates a 24-character hexadecimal hash that uniquely identifies the original content. This hash serves as the primary key for every downstream CCR operation.
Thread-Safe Storage in CompressionStore
The original JSON, its token count, and metadata are stored in the CompressionStore (headroom/cache/compression_store.py). This store is thread-safe and uses a default TTL of five minutes, evicting old entries with an LRU heap. Each entry is modeled as a CompressionEntry containing fields such as hash, original_content, compressed_content, original_item_count, and tool_name.
Marker Injection and Detection
The compressed output is replaced with a short marker string that includes the hash, for example:
[120 items compressed to 12. Retrieve more: hash=9f2c4e7a1b3d5e8f9a0b1c2d]
The CCRToolInjector in headroom/ccr/tool_injection.py scans every message for this pattern, extracts the hash, and records it for the current request.
Provider-Specific Tool Injection
If any hash is detected, Headroom injects a retrieval tool named headroom_retrieve into the request’s tool list or appends system-message instructions. The tool definition—constructed by create_ccr_tool_definition in the same file—varies per provider (OpenAI, Anthropic, Google) but always requires a hash parameter and supports an optional query for search.
LLM Retrieval and Response Handling
When the LLM needs more detail, it emits a tool-use block calling headroom_retrieve(hash="9f2c4e7a1b3d5e8f9a0b1c2d"). The CCRResponseHandler (headroom/ccr/response_handler.py) intercepts this response, extracts the call, and executes it against the CompressionStore:
- If a
queryis supplied, the store performs a BM25 search viastore.search. - Otherwise, it returns the full original payload via
store.retrieve.
The handler then builds a tool-result message, appends it to the conversation, and automatically issues a continuation API call. This loop repeats until no CCR calls remain, up to a maximum of three rounds.
Optional Context Tracking
A ContextTracker (headroom/ccr/context_tracker.py) maintains a per-turn view of which hashes are available in the conversation. This enables proactive expansion suggestions for the LLM, letting it know which compressed payloads can be retrieved without explicit user prompting.
Practical Code Examples
Enabling CCR in a Headroom Client
from headroom import Headroom, CompressionConfig
# Enable CCR (enabled by default)
hh = Headroom(compression=CompressionConfig(ccr_enabled=True))
# Example request that triggers compression
response = hh.run(
messages=[
{"role": "user", "content": "Show me the last 200 lines of the log file."}
]
)
# The LLM sees a short marker:
# "[200 lines compressed to 20. Retrieve more: hash=ab12cd34ef56ab78cd90ef12]"
# It decides it needs more detail and calls:
# headroom_retrieve(hash="ab12cd34ef56ab78cd90ef12")
# Headroom automatically fetches the original and continues the dialogue.
Manual Retrieval by Hash
from headroom.cache.compression_store import get_compression_store
store = get_compression_store()
entry = store.retrieve("ab12cd34ef56ab78cd90ef12")
print(entry.original_content) # Full uncompressed payload
Search Within a Stored Compression
results = store.search("ab12cd34ef56ab78cd90ef12", query="error")
print(results) # JSON list of matching items
Core Files Powering Headroom CCR Compression
These modules implement the full Compress-Cache-Retrieve workflow in chopratejas/headroom:
headroom/ccr/__init__.py— Exposes CCR components including tool injection, response handling, context tracking, and batch processing.headroom/ccr/tool_injection.py— Detects compression markers and injects theheadroom_retrievetool definition with provider-specific formatting.headroom/ccr/response_handler.py— Intercepts LLM tool-use responses, executes CCR retrievals, and manages automatic continuation calls.headroom/ccr/context_tracker.py— Tracks available hashes across conversation turns and suggests proactive expansions.headroom/cache/compression_store.py— Thread-safe in-memory cache that stores original content, enforces TTL, handles LRU eviction, and supports BM25 search.headroom/cache/backends/in_memory.py— Default backend implementation used by the CompressionStore.headroom/transformers/smart_crusher.py— Performs the actual payload compression and registers entries in the CompressionStore.
Summary
- Headroom CCR compression uses a 24-hex hash to uniquely identify and replace large tool outputs with compact markers.
- The CompressionStore (
headroom/cache/compression_store.py) retains originals in a thread-safe, TTL-backed, LRU-evicted cache. - CCRToolInjector (
headroom/ccr/tool_injection.py) automatically exposes aheadroom_retrievetool to the LLM when markers are present. - CCRResponseHandler (
headroom/ccr/response_handler.py) executes retrievals, supports BM25 search, and chains continuation calls up to three rounds. - The optional ContextTracker (
headroom/ccr/context_tracker.py) gives the LLM visibility into available hashes for proactive data expansion.
Frequently Asked Questions
What does CCR stand for in Headroom?
CCR stands for Compress-Cache-Retrieve. It describes the three-phase workflow where large payloads are compressed into hash-based markers, cached in the CompressionStore, and retrieved on demand through an injected tool.
How long does Headroom keep compressed data in the cache?
According to the source code in headroom/cache/compression_store.py, entries use a default TTL of five minutes and are evicted via an LRU heap when the store reaches capacity. This balances memory usage with the likelihood that the LLM will request the data within the same session.
Can the LLM search inside a compressed payload without retrieving everything?
Yes. The headroom_retrieve tool accepts an optional query parameter. When supplied, the CCRResponseHandler calls store.search to perform a BM25 search against the original content and returns only matching items, saving additional tokens.
Is Headroom CCR compression lossless?
The compression layer itself is lossless because the original content is preserved in full inside the CompressionStore. The LLM only sees a summarized or compressed marker, but it can always recover the exact original payload by calling headroom_retrieve with the correct hash.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →