# How to Enable and Configure the Two-Tier Compression Cache in ContentRouter

> Enable and configure the two tier compression cache in ContentRouter for improved performance. Learn to set compress cache enabled, max entries, ttl seconds, and expansion threshold.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: how-to-guide
- Published: 2026-06-06

---

**Enable the two-tier compression cache (CCR) in Headroom by setting `compress_cache.enabled: true` in your [`headroom.yaml`](https://github.com/chopratejas/headroom/blob/main/headroom.yaml) configuration file or using the `--ccr` CLI flag, then tune `max_entries`, `ttl_seconds`, and `expansion_threshold` to balance token savings against retrieval latency.**

The **two-tier compression cache** (CCR) in Headroom's ContentRouter minimizes LLM token usage by compressing payloads into hash-based markers while storing original bytes for lossless retrieval. This architecture—implemented in [`headroom/transforms/content_router.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/content_router.py) and [`headroom/perf/analyzer.py`](https://github.com/chopratejas/headroom/blob/main/headroom/perf/analyzer.py)—is active by default but supports granular configuration via YAML files or command-line flags to control cache size, TTL, and proactive expansion behavior.

## Understanding the Two-Tier Compression Cache Architecture

The CCR system splits caching logic across two distinct tiers to optimize both compression efficiency and data accessibility.

### Tier 1 – Compression Store

When the ContentRouter compresses a payload (code files, logs, or search results), the `compress_and_cache` routine in [`headroom/transforms/content_router.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/content_router.py) stores the original uncompressed bytes in an in-memory LRU cache. It emits a hash key within a marker that replaces the payload in the LLM prompt, typically formatted as:

```

[1000 lines compressed to 20. Retrieve more: hash=ab12cd34]

```

### Tier 2 – Retrieval and Proactive Expansion

The second tier handles data reconstruction through two mechanisms. First, the response-handler in [`headroom/perf/analyzer.py`](https://github.com/chopratejas/headroom/blob/main/headroom/perf/analyzer.py) intercepts calls to the auto-injected `headroom_retrieve` tool, fetching original payloads from the cache on demand. Second, the Context Tracker—also located in [`headroom/transforms/content_router.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/content_router.py)—proactively expands cached content when subsequent query similarity exceeds the configured `expansion_threshold`.

## Enabling CCR via Configuration File

The most persistent method to manage the two-tier compression cache is through the `compress_cache` section in your [`headroom.yaml`](https://github.com/chopratejas/headroom/blob/main/headroom.yaml) (or JSON) configuration file. As defined in [`headroom/config.py`](https://github.com/chopratejas/headroom/blob/main/headroom/config.py), add the following structure:

```yaml

# headroom.yaml

compress_cache:
  enabled: true          # Activates the two-tier cache (default: true)

  max_entries: 5000     # Maximum items in the LRU cache

  ttl_seconds: 3600     # Entry lifetime before eviction (1 hour)

```

## Enabling CCR via CLI Flags

For temporary overrides or scripting, use the CLI flags exposed in [`headroom/cli.py`](https://github.com/chopratejas/headroom/blob/main/headroom/cli.py):

```bash

# Explicitly enable CCR (useful when overriding global --no-ccr)

headroom proxy --ccr --port 8787

# Disable automatic retrieval via headroom_retrieve

headroom proxy --no-ccr-responses

# Disable proactive expansion of cached content

headroom proxy --no-ccr-expansion

```

## Configuring Cache Parameters

Fine-tune cache behavior using these parameters parsed by [`headroom/config.py`](https://github.com/chopratejas/headroom/blob/main/headroom/config.py):

- **max_entries**: Upper bound of cached objects before LRU eviction kicks in. Typical values range from **1,000 to 10,000**.
- **ttl_seconds**: Time-to-live for each entry in seconds. Common settings range from **300** (5 minutes) to **86400** (24 hours).
- **hash_algorithm**: Currently fixed to **SHA-256** for generating retrieval keys.
- **cache_path**: Optional disk-backed location (e.g., `~/.headroom/cache`) for persistence across proxy restarts.

### Controlling Proactive Expansion

The Context Tracker's aggressiveness is governed by the `expansion_threshold` parameter (0-1 scale), where higher values result in fewer automatic expansions:

```yaml
compress_cache:
  enabled: true
  expansion_threshold: 0.75   # Reduces auto-expansion frequency

```

## Complete Configuration Example

Combine all options to optimize for an 8,000-entry cache with 2-hour TTL and conservative expansion:

```yaml

# headroom.yaml

compress_cache:
  enabled: true
  max_entries: 8000
  ttl_seconds: 7200
  expansion_threshold: 0.75
  cache_path: ~/.headroom/cache

```

Start the proxy with your configuration:

```bash
headroom proxy --config headroom.yaml --port 8787

```

With this configuration active, [`headroom/transforms/content_router.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/content_router.py) caches compressed payloads using the `compress_and_cache` routine, while [`headroom/perf/analyzer.py`](https://github.com/chopratejas/headroom/blob/main/headroom/perf/analyzer.py) stands ready to fulfill `headroom_retrieve` requests from the LRU store. The Context Tracker monitors conversation turns and automatically expands cached content when similarity scores exceed the 0.75 threshold.

## Summary

- **Enable CCR** by setting `compress_cache.enabled: true` in [`headroom.yaml`](https://github.com/chopratejas/headroom/blob/main/headroom.yaml) or using the `--ccr` CLI flag.
- **Tier 1** compresses payloads and stores original bytes in [`headroom/transforms/content_router.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/content_router.py).
- **Tier 2** retrieves data via `headroom_retrieve` in [`headroom/perf/analyzer.py`](https://github.com/chopratejas/headroom/blob/main/headroom/perf/analyzer.py) and proactively expands content via the Context Tracker.
- **Configure limits** with `max_entries` (cache size) and `ttl_seconds` (entry lifetime).
- **Control expansion sensitivity** using `expansion_threshold` (0-1) to determine when the Context Tracker automatically reveals cached content.

## Frequently Asked Questions

### What happens when the cache reaches max_entries?

When the cache hits the `max_entries` limit defined in your configuration, the LRU (Least Recently Used) eviction policy removes the oldest entries to make room for new data. This prevents unbounded memory growth while keeping frequently accessed compressed payloads available for retrieval.

### How does the LLM retrieve compressed content when needed?

The LLM requests original data through the hidden `headroom_retrieve` tool, which the response-handler in [`headroom/perf/analyzer.py`](https://github.com/chopratejas/headroom/blob/main/headroom/perf/analyzer.py) intercepts. The handler looks up the hash key in the LRU cache managed by [`headroom/transforms/content_router.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/content_router.py) and injects the original payload back into the conversation context automatically.

### What is the difference between --no-ccr-responses and --no-ccr-expansion?

The `--no-ccr-responses` flag disables the automatic handling of `headroom_retrieve` calls, meaning compressed markers remain unresolved unless manually processed. The `--no-ccr-expansion` flag specifically disables the Context Tracker's proactive behavior, preventing automatic expansion of cached content based on query similarity while still allowing explicit retrieval via the tool.

### Can I persist the compression cache across proxy restarts?

Yes, specify a `cache_path` in your [`headroom.yaml`](https://github.com/chopratejas/headroom/blob/main/headroom.yaml) configuration (e.g., `~/.headroom/cache`) to enable disk-backed storage. According to the configuration schema in [`headroom/config.py`](https://github.com/chopratejas/headroom/blob/main/headroom/config.py), this optional parameter ensures cached entries survive proxy restarts, though entries still respect the `ttl_seconds` expiration policy upon reload.