What Is CCR and How Does It Enable Reversible Compression in Headroom?

CCR (Compress‑Cache‑Retrieve) is Headroom’s architectural layer that stores original payloads in an LRU cache while replacing them with short hash markers, allowing aggressive token compression without permanent information loss.

The CCR system in the open-source chopratejas/headroom repository solves the fundamental trade-off between reducing token costs and preserving data fidelity. Unlike traditional compression that discards information to save space, CCR makes compression reversible by maintaining a complete backup of every transformed payload. This approach enables the LLM to access full original data on demand by referencing lightweight placeholders.

How CCR Works: The Four-Phase Architecture

CCR operates through four coordinated components that work transparently across the request lifecycle.

Compression Store with LRU Caching

When a transform like SmartCrusher processes large tool outputs—such as JSON arrays or log files—the Compression Store captures the full original payload. According to the source code in crates/headroom-core/src/ccr/mod.rs, the system generates a 24-character BLAKE3 hash via compute_key and produces a marker using marker_for. This marker, formatted as <<ccr:HASH>>, replaces the bulky content in the prompt sent to the LLM, while the original data lives in an LRU cache.

Tool Injection

The proxy automatically injects a headroom_retrieve tool definition into the LLM’s available tool list. As documented in wiki/ccr.md, this tool schema allows the model to request the original data when the compressed summary proves insufficient. The CCR marker in the compressed output signals to the LLM that retrieval is possible, creating a seamless bridge between compressed context and full detail.

Response Handler

After the LLM invokes headroom_retrieve with a specific hash, the CCR response handler intercepts the call. Implemented in the architecture described in wiki/ccr.md, this handler looks up the hash in the cache and returns either the exact original payload or a BM25-search result from the stored content. This process happens automatically without exposing cache mechanics to the end client, maintaining a clean conversation flow.

Context Tracker

Across multiple conversation turns, the Context Tracker maintains awareness of which hashes were created and their semantic relevance. As noted in wiki/ccr.md, this component can proactively expand compressed contexts when later user queries reference previous summaries, ensuring the LLM receives necessary details without explicit user requests for retrieval.

Implementation Examples

Generating CCR Markers in Rust

The core hashing logic resides in crates/headroom-core/src/ccr/mod.rs. The compute_key function creates the BLAKE3 hash, while marker_for formats it for insertion:

use headroom_core::ccr::{compute_key, marker_for};

let payload = r#"[
    {"ts":1,"cpu":45},
    {"ts":2,"cpu":45},
    // … many more items …
]"#;

let hash = compute_key(payload.as_bytes());
let marker = marker_for(&hash);
println!("Compressed block will contain: {}", marker);
// Output: <<ccr:1a2b3c4d5e6f7g8h9i0j1k>>

This marker replaces thousands of tokens in the LLM context window while preserving a lookup key for the original data.

Using headroom_retrieve in Python

When working with the Headroom Python client, the headroom_retrieve tool handles automatic expansion. The tool definition specified in wiki/ccr.md enables the following flow:

from headroom import HeadroomClient, OpenAIProvider
from openai import OpenAI

# Wrap the original OpenAI client

base = OpenAI(api_key="...")
client = HeadroomClient(original_client=base, provider=OpenAIProvider())

# Request with compression enabled

resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Give me all logs from the last hour"}],
    headroom_mode="optimize",
)

# The LLM may emit a tool call like:

# {

#   "tool_calls": [{

#     "id": "call_1",

#     "function": {

#       "name": "headroom_retrieve",

#       "arguments": "{\"hash\":\"<<ccr:abc123def456>>\"}"

#     }

#   }]

# }

# Proxy automatically handles retrieval and returns final answer

print(resp.choices[0].message.content)

Proactive Context Expansion

The Context Tracker can trigger retrievals automatically without explicit tool calls from the LLM. As implemented in the reference architecture:


# Follow-up query referencing previous compressed data

follow_up = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "assistant", "content": "Here are the top 5 error spikes."},
        {"role": "user", "content": "Show me the full logs for the biggest spike."},
    ],
    headroom_mode="optimize",
)

# Context Tracker identifies hash "abc123" from prior turn,

# automatically injects retrieval, and returns full logs

print(follow_up.choices[0].message.content)

Key Source Files and Components

Understanding CCR requires familiarity with these specific modules:

  • crates/headroom-core/src/ccr/mod.rs – Core utilities including compute_key, marker_for, and the CcrStore trait for hash generation and marker formatting.
  • wiki/ccr.md – Human-readable specification defining CCR phases, the headroom_retrieve tool schema, and proactive expansion protocols.
  • ccr/tool_injection.py – Implements the logic that injects the headroom_retrieve tool into the LLM’s tool list at request time.
  • ccr/response_handler.py – Detects CCR tool calls, performs cache lookups, and returns original payloads or BM25 search results.
  • ccr/context_tracker.py – Tracks hash lifecycles across conversation turns and drives proactive expansion of relevant compressed contexts.
  • wiki/ARCHITECTURE.md – High-level diagram linking the compression store, tool injection, and response handler components.

Summary

  • CCR (Compress‑Cache‑Retrieve) eliminates the information-loss trade-off in LLM context compression by storing originals in an LRU cache while sending short hash markers to the model.
  • Four components—Compression Store, Tool Injection, Response Handler, and Context Tracker—work together to make compression reversible across conversation turns.
  • BLAKE3 hashing in crates/headroom-core/src/ccr/mod.rs generates secure 24-character keys that replace verbose payloads in the context window.
  • Automatic retrieval via headroom_retrieve allows the LLM to access full original data on demand without client-side complexity.
  • Proactive expansion ensures relevant compressed contexts expand automatically when follow-up queries reference previous summaries.

Frequently Asked Questions

What does CCR stand for in Headroom?

CCR stands for Compress‑Cache‑Retrieve. It describes the three-stage workflow where tool outputs are compressed and replaced with hash markers, cached in an LRU store, and then retrieved on demand when the LLM needs the full original data.

How does CCR differ from standard compression methods?

Standard compression permanently reduces data size by removing or encoding information, which risks losing details the LLM might need later. CCR is reversible because it stores the complete original payload in crates/headroom-core/src/ccr/mod.rs while only transmitting a lightweight <<ccr:HASH>> marker, allowing exact reconstruction of the source material at any time.

What hash algorithm does Headroom CCR use?

Headroom CCR uses BLAKE3 to generate 24-character hash keys. According to the implementation in crates/headroom-core/src/ccr/mod.rs, the compute_key function processes the original payload bytes to create these unique identifiers that serve as cache lookup keys.

Can CCR work with any LLM provider?

Yes. CCR operates as a proxy layer that injects the headroom_retrieve tool definition into the standard tool-calling interface. As shown in wiki/ccr.md, this works with any provider supporting tool use, including OpenAI, Anthropic, and open-source models via compatible APIs.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →