architecture

Headroom CCR Retrieval Tool Architecture and Provider Injection Guide

June 9, 2026 chopratejas/headroom ↗

The CCR (Compress-Cache-Retrieve) retrieval tool in Headroom enables reversible compression by caching original payloads and injecting a headroom_retrieve function that allows LLMs to request full data when needed.

The CCR retrieval tool is the core mechanism that makes Headroom's aggressive compression strategy reversible. When content is compressed, the original payload gets stored in a short-lived cache while a 24-character hex hash marker takes its place in the conversation. This architecture allows large language models to retrieve the full original data on demand without risking permanent information loss.

How the CCR Retrieval Tool Works

The CCR retrieval flow operates through three distinct phases managed by the CCRToolInjector and CCRResponseHandler classes.

Phase 1: Compression Marker Detection

The CCRToolInjector.scan_for_markers method in headroom/ccr/tool_injection.py parses every incoming message to detect compression markers. It handles various message formats including strings, list-blocks, and Google "parts" structures. The method extracts 24-character hex hashes from these markers using provider-specific regex patterns.

Phase 2: Tool and System Instruction Injection

When markers are detected or when a session is already CCR-enabled, CCRToolInjector.inject_tool_definition adds the headroom_retrieve function definition to the request's tools array. The inject_into_system_message method optionally appends human-readable instructions to the system prompt. Both methods adapt their output to the specific provider format—OpenAI, Anthropic, or Google.

Phase 3: Response Handling and Retrieval

The CCRResponseHandler.handle_response method detects headroom_retrieve calls in the LLM's response. It fetches the original data from the in-memory CCR store via get_compression_store(), formats a tool-result message, and automatically issues follow-up requests until the LLM produces a final response without CCR calls. For streaming responses, StreamingCCRHandler manages the same workflow asynchronously.

How to Inject the CCR Tool Into Provider Requests

Injecting the CCR retrieval tool into provider requests requires scanning for markers, conditionally injecting the tool definition, and preparing the response handler.

First, initialize the injector and scan the conversation:

from headroom.ccr import CCRToolInjector

# Initialize injector for specific provider

injector = CCRToolInjector(provider="anthropic")  # or "openai", "google"

# Scan request messages for compression markers

injector.scan_for_markers(messages)

Next, inject the tool definition and optional system instructions:


# Inject tool definition if markers found or session already CCR-enabled

tools, was_injected = injector.inject_tool_definition(
    tools, 
    session_has_done_ccr=False
)

# Optionally add system-message instructions

messages = injector.inject_into_system_message(messages)

Finally, handle the LLM response with the retrieval handler:

from headroom.ccr import CCRResponseHandler, ResponseHandlerConfig

handler = CCRResponseHandler(ResponseHandlerConfig())
final_response = await handler.handle_response(
    response,               # Initial LLM response JSON

    messages,               # Conversation history

    tools,                  # Must contain injected CCR tool

    api_call_fn,            # Async function for next LLM request

    provider="anthropic",   # Provider name

)

The handler loops automatically up to max_retrieval_rounds until no headroom_retrieve calls remain, then returns the cleaned final response.

Key Components and Source Files

The CCR retrieval architecture spans several modules in the Headroom codebase:

headroom/ccr/tool_injection.py — Contains CCRToolInjector for marker detection and tool injection
headroom/ccr/response_handler.py — Implements CCRResponseHandler and StreamingCCRHandler for managing retrieval loops
headroom/cache/compression_store.py — Stores cached original data with retrieve and search methods
headroom/proxy/helpers.py — Provides apply_session_sticky_ccr_tool for proxy pipeline integration
headroom/proxy/handlers/anthropic.py — Shows provider-specific usage (similar files exist for OpenAI and Google)

Summary

The CCR retrieval tool makes compression reversible by caching original payloads and inserting retrievable markers containing 24-character hex hashes.
CCRToolInjector handles marker detection in scan_for_markers and tool injection via inject_tool_definition and inject_into_system_message.
CCRResponseHandler manages the retrieval loop automatically, fetching data from get_compression_store() and reissuing requests until complete.
The architecture supports streaming through StreamingCCRHandler and adapts to OpenAI, Anthropic, and Google provider formats.
Implementation requires scanning messages, conditionally injecting the headroom_retrieve tool, and routing responses through the handler.

Frequently Asked Questions

What is the CCR retrieval tool in Headroom?

The CCR (Compress-Cache-Retrieve) retrieval tool is a subsystem that enables reversible compression by storing original payloads in a short-lived cache and replacing them with 24-character hex hash markers. When an LLM needs the full content, it can call the headroom_retrieve tool to fetch the original data from the cache.

How does the CCR tool detect compression markers?

The CCRToolInjector.scan_for_markers method in headroom/ccr/tool_injection.py parses incoming messages using regex patterns to extract hashes from compression markers. It handles multiple message formats including strings, list-blocks, and Google "parts" structures.

Which LLM providers support the CCR retrieval tool?

The CCR retrieval tool supports OpenAI, Anthropic, and Google providers. The CCRToolInjector class adapts its tool definitions and injection logic to each provider's specific format, as implemented in the respective handler files like headroom/proxy/handlers/anthropic.py.

How does the response handler manage multiple retrieval rounds?

The CCRResponseHandler.handle_response method automatically loops up to max_retrieval_rounds times, detecting headroom_retrieve calls in each response, fetching data from the compression store, and reissuing requests until the LLM returns a final response without CCR calls. This process is handled transparently without manual intervention.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how chopratejas/headroom works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →