# Understanding the CCR (Compress-Cache-Retrieve) Mechanism in Headroom

> Explore Headroom's CCR mechanism for reversible compression. Compress tool outputs with hash markers, cache data, and retrieve full content on demand. Learn more!

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: deep-dive
- Published: 2026-06-07

---

**The CCR mechanism in Headroom enables reversible compression of large tool outputs by replacing payloads with hash-based markers, caching the original data in a Rust-backed store, and allowing LLMs to retrieve full content on demand via the `headroom_retrieve` tool.**

Headroom is an open-source framework designed to optimize token usage by compressing large tool-output payloads before they reach the LLM. Because the model occasionally needs access to the original uncompressed data, the repository `chopratejas/headroom` implements a three-stage **CCR (Compress-Cache-Retrieve) mechanism** that preserves data integrity while minimizing context window consumption. This reversible pipeline seamlessly integrates Python compression logic with a high-performance Rust caching layer to ensure deterministic data recovery.

## How the CCR Pipeline Works

The CCR mechanism consists of three tightly coupled stages that operate across the Python and Rust codebase:

| Stage | Description | Location |
|-------|-------------|----------|
| **Compress** | Compressors replace long item lists with a short marker containing a **hash** of the original data. | `headroom/transforms/*.py` (e.g., [`smart_crusher.py`](https://github.com/chopratejas/headroom/blob/main/smart_crusher.py), [`kompress_compressor.py`](https://github.com/chopratejas/headroom/blob/main/kompress_compressor.py)) |
| **Cache** | The original payload is stored in an in-memory CCR store keyed by a 24-character hex hash. | Rust `src/ccr` module (exposed via FFI) |
| **Retrieve** | The LLM calls `headroom_retrieve` to fetch the original content using the hash. | [`headroom/ccr/tool_injection.py`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/tool_injection.py) and [`headroom/ccr/response_handler.rs`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/response_handler.rs) |

### Compression and Hash Generation

During the **Compress** phase, specialized compressors like `SmartCrusher` or `LogCompressor` generate markers that include a truncated SHA-256 hash (24 hex characters) providing 96-bit collision resistance. An example marker looks like:

```

[100 items compressed to 10. Retrieve more: hash=1a2b3c4d5e6f7a8b9c0d1e2f]

```

### The Rust-Backed CCR Store

The **Cache** stage occurs in the Rust `src/ccr` module, which maintains an in-memory store exposed to Python through FFI. The hash serves as the immutable key, ensuring that once data enters the cache, it can be retrieved deterministically throughout the session.

## Detecting Compression Markers

Before the LLM can retrieve data, the system must identify which messages contain compressed content. The `CCRToolInjector` class in [`headroom/ccr/tool_injection.py`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/tool_injection.py) implements `scan_for_markers()` to walk through message content—including plain strings, Anthropic content blocks, and Google parts—extracting hashes via regular expressions:

```python
_marker_patterns = [
    re.compile(r"\[(\d+) \w+ compressed to (\d+)\. Retrieve more: hash=([a-f0-9]{24})\]"),
    re.compile(r"\[(\d+) \w+ compressed\. hash=([a-f0-9]{24})\]"),
    re.compile(r"\[.*?compressed.*?hash=([a-f0-9]{24})\]", re.IGNORECASE),
]

```

Discovered hashes are stored in `_detected_hashes`, enabling the system to set `has_compressed_content` for the current turn.

## Injecting the Retrieval Tool

When compression markers are detected, or when a session has previously used CCR (sticky mode), `CCRToolInjector.inject_tool_definition()` automatically adds the `headroom_retrieve` tool to the request. The injection supports provider-specific schemas for OpenAI, Anthropic, and Google:

```python
def create_ccr_tool_definition(provider="anthropic") -> dict:
    return {
        "type": "function",
        "function": {
            "name": "headroom_retrieve",
            "description": "Retrieve original uncompressed content that was compressed to save tokens.",
            "parameters": {
                "type": "object",
                "properties": {
                    "hash": {"type": "string", "description": "Hash key from the compression marker"},
                    "query": {"type": "string", "description": "Optional search query"},
                },
                "required": ["hash"],
            },
        },
    }

```

### Sticky Session Management

To prevent prompt-cache thrashing from fluctuating tool lists, Headroom implements **sticky CCR** behavior. Once `session_has_done_ccr` is set via `apply_session_sticky_ccr_tool()`, the retrieval tool remains injected for the entire session, even if subsequent turns lack fresh markers. This optimization is documented in [`REALIGNMENT/04-phase-B-live-zone.md`](https://github.com/chopratejas/headroom/blob/main/REALIGNMENT/04-phase-B-live-zone.md).

## Handling Retrieval Requests

When the LLM invokes `headroom_retrieve`, the server processes the call through `parse_tool_call()` in [`headroom/ccr/tool_injection.py`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/tool_injection.py). The function enforces strict validation to prevent hash-spoofing attacks:

```python
if hash_key is not None:
    if not isinstance(hash_key, str) or len(hash_key) != 24:
        return None, None
    if not all(c in "0123456789abcdef" for c in hash_key.lower()):
        return None, None

```

After validation, the request routes to the Rust response handler ([`headroom/ccr/response_handler.rs`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/response_handler.rs)), which retrieves the original payload from the CCR store, applies optional substring filtering based on the `query` parameter, and returns the full data as a tool result.

## Security and Robustness Guarantees

The CCR mechanism incorporates multiple defensive layers:

- **Hash format validation**: Only 24-character hexadecimal strings are accepted, ensuring that only hashes generated by the compressor can resolve to cached data.
- **Graceful fallback**: If the hash is missing or malformed, `parse_tool_call()` returns `(None, None)`, allowing the request to proceed as a standard tool call without CCR side effects.
- **Corruption recovery**: Tests in [`tests/test_corrupt_golden_bytes_recovery.py`](https://github.com/chopratejas/headroom/blob/main/tests/test_corrupt_golden_bytes_recovery.py) verify that damaged CCR definitions trigger regeneration rather than raising `RuntimeError`, maintaining system stability.

## End-to-End Implementation Example

The complete roundtrip—from compression to retrieval—is validated in [`tests/test_transforms/test_smart_crusher_ccr_roundtrip.py`](https://github.com/chopratejas/headroom/blob/main/tests/test_transforms/test_smart_crusher_ccr_roundtrip.py). Below is a simplified implementation pattern:

```python
from headroom.ccr.tool_injection import CCRToolInjector, parse_tool_call
import json

# 1. Scan for markers and inject the retrieval tool

injector = CCRToolInjector(provider="anthropic", inject_tool=True)
injector.scan_for_markers(messages)
updated_msgs, updated_tools, was_injected = injector.process_request(
    messages,
    tools=existing_tools,
    session_has_done_ccr=False,
)

# 2. Simulate LLM calling headroom_retrieve

tool_call = {
    "function": {
        "name": "headroom_retrieve",
        "arguments": json.dumps({"hash": "1a2b3c4d5e6f7a8b9c0d1e2f", "query": "error"})
    }
}

# 3. Parse and validate the hash

hash_key, query = parse_tool_call(tool_call, provider="anthropic")

# hash_key is then used to fetch from the Rust CCR store

```

## Summary

- The **CCR mechanism** enables reversible compression by combining Python compressors with a Rust in-memory cache.
- Compression markers contain **24-character truncated SHA-256 hashes** that act as immutable keys in the CCR store.
- The `CCRToolInjector` class manages **automatic tool injection** and **sticky session handling** to optimize prompt caching.
- Strict **hash validation** (length 24, hex-only) prevents spoofing attacks and ensures data integrity.
- The retrieval endpoint in [`headroom/ccr/response_handler.rs`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/response_handler.rs) supports optional **query filtering** to return only relevant subsets of the original payload.

## Frequently Asked Questions

### What makes the CCR mechanism "reversible"?

The mechanism is reversible because the original payload is never discarded. Instead, it is cached in the Rust-side CCR store using a cryptographic hash as the key. When the LLM encounters a compression marker, it can call `headroom_retrieve` with the hash to access the exact original data, ensuring no information is permanently lost during compression.

### Why does Headroom use a 24-character hex hash instead of the full SHA-256?

Headroom uses a truncated SHA-256 hash (24 hex characters, representing 96 bits) to balance collision resistance with marker brevity. This provides sufficient entropy to avoid accidental collisions in practical workloads while keeping the compression markers short enough to minimize token consumption in the LLM context window.

### How does the sticky CCR mode improve performance?

Sticky CCR mode, controlled by the `session_has_done_ccr` flag and `apply_session_sticky_ccr_tool()`, keeps the `headroom_retrieve` tool definition persistently injected after the first use. This prevents cache thrashing that would occur if the tool list toggled on and off between turns, stabilizing the prompt context for the LLM provider's caching mechanisms.

### What happens if a corrupted CCR tool definition is detected?

According to [`tests/test_corrupt_golden_bytes_recovery.py`](https://github.com/chopratejas/headroom/blob/main/tests/test_corrupt_golden_bytes_recovery.py), the system detects corrupted golden bytes and regenerates a fresh tool definition rather than crashing with a `RuntimeError`. This ensures that temporary data corruption or version mismatches do not interrupt the compression-retrieval workflow.