How to Enable and Configure the Two-Tier Compression Cache in ContentRouter

Enable the two-tier compression cache (CCR) in Headroom by setting compress_cache.enabled: true in your headroom.yaml configuration file or using the --ccr CLI flag, then tune max_entries, ttl_seconds, and expansion_threshold to balance token savings against retrieval latency.

The two-tier compression cache (CCR) in Headroom's ContentRouter minimizes LLM token usage by compressing payloads into hash-based markers while storing original bytes for lossless retrieval. This architecture—implemented in headroom/transforms/content_router.py and headroom/perf/analyzer.py—is active by default but supports granular configuration via YAML files or command-line flags to control cache size, TTL, and proactive expansion behavior.

Understanding the Two-Tier Compression Cache Architecture

The CCR system splits caching logic across two distinct tiers to optimize both compression efficiency and data accessibility.

Tier 1 – Compression Store

When the ContentRouter compresses a payload (code files, logs, or search results), the compress_and_cache routine in headroom/transforms/content_router.py stores the original uncompressed bytes in an in-memory LRU cache. It emits a hash key within a marker that replaces the payload in the LLM prompt, typically formatted as:


[1000 lines compressed to 20. Retrieve more: hash=ab12cd34]

Tier 2 – Retrieval and Proactive Expansion

The second tier handles data reconstruction through two mechanisms. First, the response-handler in headroom/perf/analyzer.py intercepts calls to the auto-injected headroom_retrieve tool, fetching original payloads from the cache on demand. Second, the Context Tracker—also located in headroom/transforms/content_router.py—proactively expands cached content when subsequent query similarity exceeds the configured expansion_threshold.

Enabling CCR via Configuration File

The most persistent method to manage the two-tier compression cache is through the compress_cache section in your headroom.yaml (or JSON) configuration file. As defined in headroom/config.py, add the following structure:


# headroom.yaml

compress_cache:
  enabled: true          # Activates the two-tier cache (default: true)

  max_entries: 5000     # Maximum items in the LRU cache

  ttl_seconds: 3600     # Entry lifetime before eviction (1 hour)

Enabling CCR via CLI Flags

For temporary overrides or scripting, use the CLI flags exposed in headroom/cli.py:


# Explicitly enable CCR (useful when overriding global --no-ccr)

headroom proxy --ccr --port 8787

# Disable automatic retrieval via headroom_retrieve

headroom proxy --no-ccr-responses

# Disable proactive expansion of cached content

headroom proxy --no-ccr-expansion

Configuring Cache Parameters

Fine-tune cache behavior using these parameters parsed by headroom/config.py:

  • max_entries: Upper bound of cached objects before LRU eviction kicks in. Typical values range from 1,000 to 10,000.
  • ttl_seconds: Time-to-live for each entry in seconds. Common settings range from 300 (5 minutes) to 86400 (24 hours).
  • hash_algorithm: Currently fixed to SHA-256 for generating retrieval keys.
  • cache_path: Optional disk-backed location (e.g., ~/.headroom/cache) for persistence across proxy restarts.

Controlling Proactive Expansion

The Context Tracker's aggressiveness is governed by the expansion_threshold parameter (0-1 scale), where higher values result in fewer automatic expansions:

compress_cache:
  enabled: true
  expansion_threshold: 0.75   # Reduces auto-expansion frequency

Complete Configuration Example

Combine all options to optimize for an 8,000-entry cache with 2-hour TTL and conservative expansion:


# headroom.yaml

compress_cache:
  enabled: true
  max_entries: 8000
  ttl_seconds: 7200
  expansion_threshold: 0.75
  cache_path: ~/.headroom/cache

Start the proxy with your configuration:

headroom proxy --config headroom.yaml --port 8787

With this configuration active, headroom/transforms/content_router.py caches compressed payloads using the compress_and_cache routine, while headroom/perf/analyzer.py stands ready to fulfill headroom_retrieve requests from the LRU store. The Context Tracker monitors conversation turns and automatically expands cached content when similarity scores exceed the 0.75 threshold.

Summary

  • Enable CCR by setting compress_cache.enabled: true in headroom.yaml or using the --ccr CLI flag.
  • Tier 1 compresses payloads and stores original bytes in headroom/transforms/content_router.py.
  • Tier 2 retrieves data via headroom_retrieve in headroom/perf/analyzer.py and proactively expands content via the Context Tracker.
  • Configure limits with max_entries (cache size) and ttl_seconds (entry lifetime).
  • Control expansion sensitivity using expansion_threshold (0-1) to determine when the Context Tracker automatically reveals cached content.

Frequently Asked Questions

What happens when the cache reaches max_entries?

When the cache hits the max_entries limit defined in your configuration, the LRU (Least Recently Used) eviction policy removes the oldest entries to make room for new data. This prevents unbounded memory growth while keeping frequently accessed compressed payloads available for retrieval.

How does the LLM retrieve compressed content when needed?

The LLM requests original data through the hidden headroom_retrieve tool, which the response-handler in headroom/perf/analyzer.py intercepts. The handler looks up the hash key in the LRU cache managed by headroom/transforms/content_router.py and injects the original payload back into the conversation context automatically.

What is the difference between --no-ccr-responses and --no-ccr-expansion?

The --no-ccr-responses flag disables the automatic handling of headroom_retrieve calls, meaning compressed markers remain unresolved unless manually processed. The --no-ccr-expansion flag specifically disables the Context Tracker's proactive behavior, preventing automatic expansion of cached content based on query similarity while still allowing explicit retrieval via the tool.

Can I persist the compression cache across proxy restarts?

Yes, specify a cache_path in your headroom.yaml configuration (e.g., ~/.headroom/cache) to enable disk-backed storage. According to the configuration schema in headroom/config.py, this optional parameter ensures cached entries survive proxy restarts, though entries still respect the ttl_seconds expiration policy upon reload.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →