# How Headroom CCR Compression Works: The Compress-Cache-Retrieve Pipeline

> Discover how Headroom CCR compression works. Learn about the Compress-Cache-Retrieve pipeline that replaces large outputs with short markers, storing originals in an in-memory cache for on-demand LLM retrieval.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: deep-dive
- Published: 2026-06-09

---

**Headroom’s CCR compression replaces large tool outputs with short hash-based markers, stores the originals in a thread-safe in-memory cache, and injects a `headroom_retrieve` tool so the LLM can fetch full data on demand.**

Headroom CCR compression is a reversible compression layer in the `chopratejas/headroom` repository that aggressively shrinks large LLM tool outputs while eliminating information loss. Unlike one-way summarization, this Compress-Cache-Retrieve (CCR) workflow lets transformers like *SmartCrusher* and *Kompress* reduce payloads to a fraction of their original token count, then rebuild the full conversation context through on-demand retrieval.

## The CCR Pipeline: From Compression to Retrieval

Headroom CCR compression operates through a six-stage pipeline implemented across `headroom/ccr/` and `headroom/cache/`.

### Compression and Hash Generation

When a transformer such as *SmartCrusher* or *Kompress* processes a payload, it first generates a **24-character hexadecimal hash** that uniquely identifies the original content. This hash serves as the primary key for every downstream CCR operation.

### Thread-Safe Storage in CompressionStore

The original JSON, its token count, and metadata are stored in the **CompressionStore** ([`headroom/cache/compression_store.py`](https://github.com/chopratejas/headroom/blob/main/headroom/cache/compression_store.py)). This store is thread-safe and uses a default **TTL of five minutes**, evicting old entries with an LRU heap. Each entry is modeled as a `CompressionEntry` containing fields such as `hash`, `original_content`, `compressed_content`, `original_item_count`, and `tool_name`.

### Marker Injection and Detection

The compressed output is replaced with a short marker string that includes the hash, for example:

```

[120 items compressed to 12. Retrieve more: hash=9f2c4e7a1b3d5e8f9a0b1c2d]

```

The **CCRToolInjector** in [`headroom/ccr/tool_injection.py`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/tool_injection.py) scans every message for this pattern, extracts the hash, and records it for the current request.

### Provider-Specific Tool Injection

If any hash is detected, Headroom injects a retrieval tool named **`headroom_retrieve`** into the request’s tool list or appends system-message instructions. The tool definition—constructed by `create_ccr_tool_definition` in the same file—varies per provider (OpenAI, Anthropic, Google) but always requires a `hash` parameter and supports an optional `query` for search.

### LLM Retrieval and Response Handling

When the LLM needs more detail, it emits a tool-use block calling `headroom_retrieve(hash="9f2c4e7a1b3d5e8f9a0b1c2d")`. The **CCRResponseHandler** ([`headroom/ccr/response_handler.py`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/response_handler.py)) intercepts this response, extracts the call, and executes it against the CompressionStore:

- If a `query` is supplied, the store performs a **BM25 search** via `store.search`.
- Otherwise, it returns the full original payload via `store.retrieve`.

The handler then builds a tool-result message, appends it to the conversation, and automatically issues a continuation API call. This loop repeats until no CCR calls remain, up to a maximum of **three rounds**.

### Optional Context Tracking

A **ContextTracker** ([`headroom/ccr/context_tracker.py`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/context_tracker.py)) maintains a per-turn view of which hashes are available in the conversation. This enables proactive expansion suggestions for the LLM, letting it know which compressed payloads can be retrieved without explicit user prompting.

## Practical Code Examples

### Enabling CCR in a Headroom Client

```python
from headroom import Headroom, CompressionConfig

# Enable CCR (enabled by default)

hh = Headroom(compression=CompressionConfig(ccr_enabled=True))

# Example request that triggers compression

response = hh.run(
    messages=[
        {"role": "user", "content": "Show me the last 200 lines of the log file."}
    ]
)

# The LLM sees a short marker:

#   "[200 lines compressed to 20. Retrieve more: hash=ab12cd34ef56ab78cd90ef12]"

# It decides it needs more detail and calls:

#   headroom_retrieve(hash="ab12cd34ef56ab78cd90ef12")

# Headroom automatically fetches the original and continues the dialogue.

```

### Manual Retrieval by Hash

```python
from headroom.cache.compression_store import get_compression_store

store = get_compression_store()
entry = store.retrieve("ab12cd34ef56ab78cd90ef12")
print(entry.original_content)   # Full uncompressed payload

```

### Search Within a Stored Compression

```python
results = store.search("ab12cd34ef56ab78cd90ef12", query="error")
print(results)   # JSON list of matching items

```

## Core Files Powering Headroom CCR Compression

These modules implement the full Compress-Cache-Retrieve workflow in `chopratejas/headroom`:

- [`headroom/ccr/__init__.py`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/__init__.py) — Exposes CCR components including tool injection, response handling, context tracking, and batch processing.
- [`headroom/ccr/tool_injection.py`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/tool_injection.py) — Detects compression markers and injects the `headroom_retrieve` tool definition with provider-specific formatting.
- [`headroom/ccr/response_handler.py`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/response_handler.py) — Intercepts LLM tool-use responses, executes CCR retrievals, and manages automatic continuation calls.
- [`headroom/ccr/context_tracker.py`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/context_tracker.py) — Tracks available hashes across conversation turns and suggests proactive expansions.
- [`headroom/cache/compression_store.py`](https://github.com/chopratejas/headroom/blob/main/headroom/cache/compression_store.py) — Thread-safe in-memory cache that stores original content, enforces TTL, handles LRU eviction, and supports BM25 search.
- [`headroom/cache/backends/in_memory.py`](https://github.com/chopratejas/headroom/blob/main/headroom/cache/backends/in_memory.py) — Default backend implementation used by the CompressionStore.
- [`headroom/transformers/smart_crusher.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transformers/smart_crusher.py) — Performs the actual payload compression and registers entries in the CompressionStore.

## Summary

- Headroom CCR compression uses a **24-hex hash** to uniquely identify and replace large tool outputs with compact markers.
- The **CompressionStore** ([`headroom/cache/compression_store.py`](https://github.com/chopratejas/headroom/blob/main/headroom/cache/compression_store.py)) retains originals in a thread-safe, TTL-backed, LRU-evicted cache.
- **CCRToolInjector** ([`headroom/ccr/tool_injection.py`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/tool_injection.py)) automatically exposes a **`headroom_retrieve`** tool to the LLM when markers are present.
- **CCRResponseHandler** ([`headroom/ccr/response_handler.py`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/response_handler.py)) executes retrievals, supports **BM25 search**, and chains continuation calls up to three rounds.
- The optional **ContextTracker** ([`headroom/ccr/context_tracker.py`](https://github.com/chopratejas/headroom/blob/main/headroom/ccr/context_tracker.py)) gives the LLM visibility into available hashes for proactive data expansion.

## Frequently Asked Questions

### What does CCR stand for in Headroom?

CCR stands for **Compress-Cache-Retrieve**. It describes the three-phase workflow where large payloads are compressed into hash-based markers, cached in the CompressionStore, and retrieved on demand through an injected tool.

### How long does Headroom keep compressed data in the cache?

According to the source code in [`headroom/cache/compression_store.py`](https://github.com/chopratejas/headroom/blob/main/headroom/cache/compression_store.py), entries use a default **TTL of five minutes** and are evicted via an LRU heap when the store reaches capacity. This balances memory usage with the likelihood that the LLM will request the data within the same session.

### Can the LLM search inside a compressed payload without retrieving everything?

Yes. The `headroom_retrieve` tool accepts an optional `query` parameter. When supplied, the **CCRResponseHandler** calls `store.search` to perform a **BM25 search** against the original content and returns only matching items, saving additional tokens.

### Is Headroom CCR compression lossless?

The compression layer itself is **lossless** because the original content is preserved in full inside the CompressionStore. The LLM only sees a summarized or compressed marker, but it can always recover the exact original payload by calling `headroom_retrieve` with the correct hash.