Headroom CCR Retrieval Tool Architecture and Provider Injection Guide
The CCR (Compress-Cache-Retrieve) retrieval tool in Headroom enables reversible compression by caching original payloads and injecting a headroom_retrieve function that allows LLMs to request full data when needed.
The CCR retrieval tool is the core mechanism that makes Headroom's aggressive compression strategy reversible. When content is compressed, the original payload gets stored in a short-lived cache while a 24-character hex hash marker takes its place in the conversation. This architecture allows large language models to retrieve the full original data on demand without risking permanent information loss.
How the CCR Retrieval Tool Works
The CCR retrieval flow operates through three distinct phases managed by the CCRToolInjector and CCRResponseHandler classes.
Phase 1: Compression Marker Detection
The CCRToolInjector.scan_for_markers method in headroom/ccr/tool_injection.py parses every incoming message to detect compression markers. It handles various message formats including strings, list-blocks, and Google "parts" structures. The method extracts 24-character hex hashes from these markers using provider-specific regex patterns.
Phase 2: Tool and System Instruction Injection
When markers are detected or when a session is already CCR-enabled, CCRToolInjector.inject_tool_definition adds the headroom_retrieve function definition to the request's tools array. The inject_into_system_message method optionally appends human-readable instructions to the system prompt. Both methods adapt their output to the specific provider format—OpenAI, Anthropic, or Google.
Phase 3: Response Handling and Retrieval
The CCRResponseHandler.handle_response method detects headroom_retrieve calls in the LLM's response. It fetches the original data from the in-memory CCR store via get_compression_store(), formats a tool-result message, and automatically issues follow-up requests until the LLM produces a final response without CCR calls. For streaming responses, StreamingCCRHandler manages the same workflow asynchronously.
How to Inject the CCR Tool Into Provider Requests
Injecting the CCR retrieval tool into provider requests requires scanning for markers, conditionally injecting the tool definition, and preparing the response handler.
First, initialize the injector and scan the conversation:
from headroom.ccr import CCRToolInjector
# Initialize injector for specific provider
injector = CCRToolInjector(provider="anthropic") # or "openai", "google"
# Scan request messages for compression markers
injector.scan_for_markers(messages)
Next, inject the tool definition and optional system instructions:
# Inject tool definition if markers found or session already CCR-enabled
tools, was_injected = injector.inject_tool_definition(
tools,
session_has_done_ccr=False
)
# Optionally add system-message instructions
messages = injector.inject_into_system_message(messages)
Finally, handle the LLM response with the retrieval handler:
from headroom.ccr import CCRResponseHandler, ResponseHandlerConfig
handler = CCRResponseHandler(ResponseHandlerConfig())
final_response = await handler.handle_response(
response, # Initial LLM response JSON
messages, # Conversation history
tools, # Must contain injected CCR tool
api_call_fn, # Async function for next LLM request
provider="anthropic", # Provider name
)
The handler loops automatically up to max_retrieval_rounds until no headroom_retrieve calls remain, then returns the cleaned final response.
Key Components and Source Files
The CCR retrieval architecture spans several modules in the Headroom codebase:
headroom/ccr/tool_injection.py— ContainsCCRToolInjectorfor marker detection and tool injectionheadroom/ccr/response_handler.py— ImplementsCCRResponseHandlerandStreamingCCRHandlerfor managing retrieval loopsheadroom/cache/compression_store.py— Stores cached original data withretrieveandsearchmethodsheadroom/proxy/helpers.py— Providesapply_session_sticky_ccr_toolfor proxy pipeline integrationheadroom/proxy/handlers/anthropic.py— Shows provider-specific usage (similar files exist for OpenAI and Google)
Summary
- The CCR retrieval tool makes compression reversible by caching original payloads and inserting retrievable markers containing 24-character hex hashes.
CCRToolInjectorhandles marker detection inscan_for_markersand tool injection viainject_tool_definitionandinject_into_system_message.CCRResponseHandlermanages the retrieval loop automatically, fetching data fromget_compression_store()and reissuing requests until complete.- The architecture supports streaming through
StreamingCCRHandlerand adapts to OpenAI, Anthropic, and Google provider formats. - Implementation requires scanning messages, conditionally injecting the
headroom_retrievetool, and routing responses through the handler.
Frequently Asked Questions
What is the CCR retrieval tool in Headroom?
The CCR (Compress-Cache-Retrieve) retrieval tool is a subsystem that enables reversible compression by storing original payloads in a short-lived cache and replacing them with 24-character hex hash markers. When an LLM needs the full content, it can call the headroom_retrieve tool to fetch the original data from the cache.
How does the CCR tool detect compression markers?
The CCRToolInjector.scan_for_markers method in headroom/ccr/tool_injection.py parses incoming messages using regex patterns to extract hashes from compression markers. It handles multiple message formats including strings, list-blocks, and Google "parts" structures.
Which LLM providers support the CCR retrieval tool?
The CCR retrieval tool supports OpenAI, Anthropic, and Google providers. The CCRToolInjector class adapts its tool definitions and injection logic to each provider's specific format, as implemented in the respective handler files like headroom/proxy/handlers/anthropic.py.
How does the response handler manage multiple retrieval rounds?
The CCRResponseHandler.handle_response method automatically loops up to max_retrieval_rounds times, detecting headroom_retrieve calls in each response, fetching data from the compression store, and reissuing requests until the LLM returns a final response without CCR calls. This process is handled transparently without manual intervention.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →