# Headroom CCR Retrieval Tool Architecture and Provider Injection Guide

> Discover the Headroom CCR retrieval tool architecture. Learn how to inject the headroom_retrieve function for reversible compression and LLM data requests. Explore the chopratejas/headroom repository.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: architecture
- Published: 2026-06-09

---

**The CCR (Compress-Cache-Retrieve) retrieval tool in Headroom enables reversible compression by caching original payloads and injecting a `headroom_retrieve` function that allows LLMs to request full data when needed.**

The CCR retrieval tool is the core mechanism that makes Headroom's aggressive compression strategy reversible. When content is compressed, the original payload gets stored in a short-lived cache while a 24-character hex hash marker takes its place in the conversation. This architecture allows large language models to retrieve the full original data on demand without risking permanent information loss.

## How the CCR Retrieval Tool Works

The CCR retrieval flow operates through three distinct phases managed by the `CCRToolInjector` and `CCRResponseHandler` classes.

### Phase 1: Compression Marker Detection

The `CCRToolInjector.scan_for_markers` method in [`headroom/ccr/tool_injection.py`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/tool_injection.py) parses every incoming message to detect compression markers. It handles various message formats including strings, list-blocks, and Google "parts" structures. The method extracts 24-character hex hashes from these markers using provider-specific regex patterns.

### Phase 2: Tool and System Instruction Injection

When markers are detected or when a session is already CCR-enabled, `CCRToolInjector.inject_tool_definition` adds the `headroom_retrieve` function definition to the request's `tools` array. The `inject_into_system_message` method optionally appends human-readable instructions to the system prompt. Both methods adapt their output to the specific provider format—OpenAI, Anthropic, or Google.

### Phase 3: Response Handling and Retrieval

The `CCRResponseHandler.handle_response` method detects `headroom_retrieve` calls in the LLM's response. It fetches the original data from the in-memory CCR store via `get_compression_store()`, formats a tool-result message, and automatically issues follow-up requests until the LLM produces a final response without CCR calls. For streaming responses, `StreamingCCRHandler` manages the same workflow asynchronously.

## How to Inject the CCR Tool Into Provider Requests

Injecting the CCR retrieval tool into provider requests requires scanning for markers, conditionally injecting the tool definition, and preparing the response handler.

First, initialize the injector and scan the conversation:

```python
from headroom.ccr import CCRToolInjector

# Initialize injector for specific provider

injector = CCRToolInjector(provider="anthropic")  # or "openai", "google"

# Scan request messages for compression markers

injector.scan_for_markers(messages)

```

Next, inject the tool definition and optional system instructions:

```python

# Inject tool definition if markers found or session already CCR-enabled

tools, was_injected = injector.inject_tool_definition(
    tools, 
    session_has_done_ccr=False
)

# Optionally add system-message instructions

messages = injector.inject_into_system_message(messages)

```

Finally, handle the LLM response with the retrieval handler:

```python
from headroom.ccr import CCRResponseHandler, ResponseHandlerConfig

handler = CCRResponseHandler(ResponseHandlerConfig())
final_response = await handler.handle_response(
    response,               # Initial LLM response JSON

    messages,               # Conversation history

    tools,                  # Must contain injected CCR tool

    api_call_fn,            # Async function for next LLM request

    provider="anthropic",   # Provider name

)

```

The handler loops automatically up to `max_retrieval_rounds` until no `headroom_retrieve` calls remain, then returns the cleaned final response.

## Key Components and Source Files

The CCR retrieval architecture spans several modules in the Headroom codebase:

- [`headroom/ccr/tool_injection.py`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/tool_injection.py) — Contains `CCRToolInjector` for marker detection and tool injection
- [`headroom/ccr/response_handler.py`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/response_handler.py) — Implements `CCRResponseHandler` and `StreamingCCRHandler` for managing retrieval loops
- [`headroom/cache/compression_store.py`](https://github.com/chopratejas/headroom/blob/main/headroom/cache/compression_store.py) — Stores cached original data with `retrieve` and `search` methods
- [`headroom/proxy/helpers.py`](https://github.com/chopratejas/headroom/blob/main/headroom/proxy/helpers.py) — Provides `apply_session_sticky_ccr_tool` for proxy pipeline integration
- [`headroom/proxy/handlers/anthropic.py`](https://github.com/chopratejas/headroom/blob/main/headroom/proxy/handlers/anthropic.py) — Shows provider-specific usage (similar files exist for OpenAI and Google)

## Summary

- The CCR retrieval tool makes compression reversible by caching original payloads and inserting retrievable markers containing 24-character hex hashes.
- `CCRToolInjector` handles marker detection in `scan_for_markers` and tool injection via `inject_tool_definition` and `inject_into_system_message`.
- `CCRResponseHandler` manages the retrieval loop automatically, fetching data from `get_compression_store()` and reissuing requests until complete.
- The architecture supports streaming through `StreamingCCRHandler` and adapts to OpenAI, Anthropic, and Google provider formats.
- Implementation requires scanning messages, conditionally injecting the `headroom_retrieve` tool, and routing responses through the handler.

## Frequently Asked Questions

### What is the CCR retrieval tool in Headroom?

The CCR (Compress-Cache-Retrieve) retrieval tool is a subsystem that enables reversible compression by storing original payloads in a short-lived cache and replacing them with 24-character hex hash markers. When an LLM needs the full content, it can call the `headroom_retrieve` tool to fetch the original data from the cache.

### How does the CCR tool detect compression markers?

The `CCRToolInjector.scan_for_markers` method in [`headroom/ccr/tool_injection.py`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/tool_injection.py) parses incoming messages using regex patterns to extract hashes from compression markers. It handles multiple message formats including strings, list-blocks, and Google "parts" structures.

### Which LLM providers support the CCR retrieval tool?

The CCR retrieval tool supports OpenAI, Anthropic, and Google providers. The `CCRToolInjector` class adapts its tool definitions and injection logic to each provider's specific format, as implemented in the respective handler files like [`headroom/proxy/handlers/anthropic.py`](https://github.com/chopratejas/headroom/blob/main/headroom/proxy/handlers/anthropic.py).

### How does the response handler manage multiple retrieval rounds?

The `CCRResponseHandler.handle_response` method automatically loops up to `max_retrieval_rounds` times, detecting `headroom_retrieve` calls in each response, fetching data from the compression store, and reissuing requests until the LLM returns a final response without CCR calls. This process is handled transparently without manual intervention.