# How Headroom's CCR (Compress-Cache-Retrieve) Architecture Enables Reversible Compression

> Discover how Headroom's CCR architecture achieves reversible compression by compressing tool outputs, caching data in Rust, and enabling LLMs to retrieve uncompressed content via hash.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: architecture
- Published: 2026-06-10

---

**Headroom's CCR architecture compresses large tool outputs into hash-based markers, caches the original data in a Rust-based store, and exposes a retrieval tool that allows LLMs to fetch uncompressed content on demand using a 24-character hex hash.**

The CCR (Compress-Cache-Retrieve) mechanism in the `chopratejas/headroom` repository provides a reversible compression pipeline that reduces token usage while preserving access to original data. This three-stage architecture allows the system to replace verbose tool outputs with compact markers, store the uncompressed payloads securely, and retrieve them via a specialized tool call when the LLM requires full context.

## The Three Stages of CCR

The CCR pipeline operates through tightly coupled stages that span both Python and Rust components.

### Compress Stage

During compression, specialized modules under `headroom/transforms/` (such as [`smart_crusher.py`](https://github.com/chopratejas/headroom/blob/main/smart_crusher.py), [`kompress_compressor.py`](https://github.com/chopratejas/headroom/blob/main/kompress_compressor.py), [`log_compressor.py`](https://github.com/chopratejas/headroom/blob/main/log_compressor.py), and [`search_compressor.py`](https://github.com/chopratejas/headroom/blob/main/search_compressor.py)) replace long item lists with concise markers. Each marker contains a **24-character hex hash** derived from a truncated SHA-256 of the original payload, providing 96-bit collision resistance while keeping the marker short.

Example marker format:

```

[100 items compressed to 10. Retrieve more: hash=1a2b3c4d5e6f7a8b9c0d1e2f]

```

### Cache Stage

The original uncompressed payload is stored in an in-memory **CCR store** implemented in the Rust `src/ccr` module. The Rust-based cache maintains the mapping between the 24-character hash key and the full payload, exposed to Python via FFI. This separation ensures high-performance storage while the Python side handles marker generation.

### Retrieve Stage

When the LLM needs the full data, it calls the `headroom_retrieve` tool. The request flows through [`headroom/ccr/response_handler.rs`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/response_handler.rs), which looks up the hash in the CCR store, optionally applies query filtering, and returns the original content. This round-trip is validated in [`tests/test_transforms/test_smart_crusher_ccr_roundtrip.py`](https://github.com/chopratejas/headroom/blob/main/tests/test_transforms/test_smart_crusher_ccr_roundtrip.py).

## Detecting Compression Markers

The `CCRToolInjector` class in [`headroom/ccr/tool_injection.py`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/tool_injection.py) scans messages for compression markers using regular expressions that match supported formats:

```python
_marker_patterns = [
    re.compile(r"\[(\d+) \w+ compressed to (\d+)\. Retrieve more: hash=([a-f0-9]{24})\]"),
    re.compile(r"\[(\d+) \w+ compressed\. hash=([a-f0-9]{24})\]"),
    re.compile(r"\[.*?compressed.*?hash=([a-f0-9]{24})\]", re.IGNORECASE),
]

```

The `scan_for_markers()` method walks through message content (including Anthropic content blocks and Google parts) and populates `_detected_hashes`. This enables the injector to determine `has_compressed_content` for the current turn.

## Injecting the Retrieval Tool

When compression is detected, `CCRToolInjector.inject_tool_definition()` adds the `headroom_retrieve` tool to the request's `tools` array. The tool definition is provider-specific (OpenAI, Anthropic, or Google):

```python
def create_ccr_tool_definition(provider="anthropic") -> dict:
    return {
        "type": "function",
        "function": {
            "name": "headroom_retrieve",
            "description": "Retrieve original uncompressed content that was compressed to save tokens.",
            "parameters": {
                "type": "object",
                "properties": {
                    "hash": {"type": "string", "description": "Hash key from the compression marker"},
                    "query": {"type": "string", "description": "Optional search query"},
                },
                "required": ["hash"],
            },
        },
    }

```

The injector can also embed instructions via `create_system_instructions()` to ensure the LLM understands when and how to invoke the tool.

## Handling Retrieval Requests

When the LLM calls `headroom_retrieve`, the system validates the request through `parse_tool_call()` in [`headroom/ccr/tool_injection.py`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/tool_injection.py):

```python
if hash_key is not None:
    if not isinstance(hash_key, str) or len(hash_key) != 24:
        return None, None
    if not all(c in "0123456789abcdef" for c in hash_key.lower()):
        return None, None

```

Valid requests route to the Rust response handler, which retrieves the payload from the CCR store and applies optional substring filtering based on the query parameter.

## Session-Level Sticky CCR Behavior

Headroom implements **sticky CCR** to prevent prompt-cache thrashing. Once a session performs a CCR round-trip (tracked via `session_has_done_ccr`), the `apply_session_sticky_ccr_tool()` method ensures the retrieval tool remains injected for the remainder of the session, even when subsequent turns contain no new compression markers. This design decision, documented in [`REALIGNMENT/04-phase-B-live-zone.md`](https://github.com/chopratejas/headroom/blob/main/REALIGNMENT/04-phase-B-live-zone.md), maintains cache stability by avoiding tool list fluctuations.

## Security and Robustness Features

The CCR architecture includes multiple safeguards:

- **Strict hash validation**: Only 24-character hexadecimal strings are accepted, preventing hash spoofing attacks
- **Graceful degradation**: Malformed hashes return `(None, None)` and fail silently without disrupting the conversation
- **Corruption recovery**: [`tests/test_corrupt_golden_bytes_recovery.py`](https://github.com/chopratejas/headroom/blob/main/tests/test_corrupt_golden_bytes_recovery.py) ensures that damaged CCR definitions regenerate automatically rather than raising `RuntimeError`

## Summary

- **Headroom's CCR architecture** consists of three stages: Compress (Python transforms), Cache (Rust in-memory store), and Retrieve (tool-based recovery).
- The system uses **24-character hex hashes** generated from truncated SHA-256 to create collision-resistant markers that replace large payloads.
- **`CCRToolInjector`** automatically detects compression markers and injects the `headroom_retrieve` tool definition into provider-specific requests.
- **Session sticky mode** maintains the retrieval tool across conversation turns to prevent cache thrashing.
- **Hash validation** ensures only genuine markers created by the compressor can access cached data, with robust fallback handling for malformed inputs.

## Frequently Asked Questions

### How does Headroom prevent hash collisions in the CCR store?

The CCR architecture uses a **24-character hex string** derived from a truncated SHA-256 hash of the original payload, providing **96-bit collision resistance**. This length balances marker brevity against collision probability, making accidental collisions statistically negligible while keeping token overhead minimal.

### What happens if the LLM calls headroom_retrieve with an invalid hash?

The `parse_tool_call()` function in [`headroom/ccr/tool_injection.py`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/tool_injection.py) validates that hashes are exactly 24 characters and contain only hexadecimal characters. If validation fails, the function returns `(None, None)`, and the request is treated as a normal tool call without triggering CCR retrieval side effects, preventing crashes or data leakage.

### Can the CCR retrieval tool filter results before returning data?

Yes. The `headroom_retrieve` tool accepts an optional `query` parameter. When provided, the Rust response handler in [`headroom/ccr/response_handler.rs`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/response_handler.rs) applies substring filtering to the cached payload before returning results, allowing the LLM to retrieve only relevant portions of large compressed datasets.

### Why does Headroom use Rust for the CCR store instead of Python?

The Rust implementation in `src/ccr` provides **high-performance, in-memory caching** with safe concurrency guarantees. By exposing this functionality via FFI, Headroom maintains Python's flexibility for transform logic while leveraging Rust's zero-cost abstractions and memory safety for the critical path of storing and retrieving potentially large payloads.