performance

How the Two-Tier CompressionCache with TTL Enhances Performance in Headroom

June 7, 2026 chopratejas/headroom ↗

Headroom's two-tier CompressionCache uses a fast, TTL-bound cache for recent payloads and a long-lived LRU store to eliminate redundant recompression, limit memory growth, and maintain sub-10 ms latency for cached items.

The chopratejas/headroom repository implements a memory-efficient compression pipeline that relies on a two-tier CompressionCache with TTL to avoid repeating expensive compression work. By pairing a time-bound fast cache with a size-bound secondary store, the system serves frequently accessed content instantly while preventing unbounded memory growth.

How the Two-Tier CompressionCache with TTL Works

The caching strategy splits responsibility across two distinct layers. The fast tier holds the most recent results under a strict expiration policy, while the secondary tier retains popular items for longer periods through size-based eviction.

Fast-TTL Cache Tier

The fast-TTL cache is implemented by the CompressionStore class in headroom/cache/compression_cache.py. Each entry receives an expiry timestamp based on a configurable TTL that defaults to 30 minutes (1800 seconds), ensuring that identical content requested within that window is returned without re-invoking the compressors. When the TTL elapses, the entry is automatically considered stale and removed.

Long-Lived LRU Cache Tier

Sitting behind the fast cache is a least-recently-used (LRU) store bounded by a maximum entry count rather than age. Frequently accessed items survive beyond the fast tier's expiration because they are promoted to this larger LRU cache, while rarely used entries are evicted once the cache reaches its limit. This secondary layer lives in the same headroom/cache/compression_cache.py module that provides the CompressionStore.

Performance Benefits of the Two-Tier CompressionCache with TTL

Both tiers work together to deliver measurable throughput and latency improvements. The design directly addresses four critical operational goals:

Reduced recompression — When the same text or code appears repeatedly, the fast-TTL cache returns the pre-compressed representation instantly, bypassing the costly compressor pipeline entirely.
TTL-driven memory control — The 30-minute TTL guarantees the fast tier never accumulates stale entries indefinitely; outdated items are automatically purged, freeing memory for newer content.
LRU tier for hot data — Items accessed often enough are promoted to the LRU store, allowing hot content to remain cached for hours or days without repeated compression overhead.
Predictable latency — Serving hits from the fast tier keeps per-message latency under 10 ms, even when the system is under heavy load.

Source Code Implementation in Headroom

The cache wiring is orchestrated in headroom/transforms/content_router.py, where the ContentRouter docstring around line 192 explicitly describes the two-tier architecture with TTL. The router coordinates lookups across both tiers before falling back to actual compression.

The fast-TTL storage layer is provided by the CompressionStore class in headroom/cache/compression_cache.py. It records an expiry timestamp for every stored payload and rejects stale results on retrieval. Default values for the TTL and LRU size limits are centralized in headroom/config.py near line 384.

Validation of TTL expiration and cache behavior can be found in tests/test_transforms_content_router.py, which exercises the integration between the router and both cache layers.

Practical Code Examples for the Two-Tier CompressionCache

Accessing the Fast-TTL Cache Directly

from headroom.cache.compression_cache import CompressionStore

# Create a store with the default 30-minute TTL

fast_cache = CompressionStore()

# Store a compressed payload; the TTL is applied automatically

fast_cache.store(key="my_content_hash", value=b"compressed_bytes")

# Retrieve later — returns None if the TTL has expired

cached = fast_cache.get("my_content_hash")
if cached:
    print("Cache hit:", cached.value)
else:
    print("Cache miss or TTL expired")

The CompressionStore class records an expiry timestamp derived from its default_ttl=1800 seconds parameter. This fast tier is the first line of defense against redundant compression work.

Using ContentRouter with Both Cache Tiers

from headroom.transforms.content_router import ContentRouter, ContentRouterConfig

# Build a router with default settings (fast-TTL 30 min, LRU size 10 000)

router = ContentRouter(ContentRouterConfig())

# Compress content; the router checks the fast TTL cache, falls back to

# the LRU store, and finally runs the compressors if needed.

compressed = router.apply(
    content="def hello():\n    print('world')",
    metadata={}
)

print("Compressed size:", len(compressed.data))

Internally, the router calls fast_cache.get() first. If the fast tier misses, it queries the LRU store before invoking the heavy compression modules.

Adjusting the TTL at Runtime

from headroom.cache.compression_cache import CompressionStore

# Override the TTL for a specific use case (e.g., 5 minutes)

short_ttl_cache = CompressionStore(default_ttl=300)

short_ttl_cache.store(key="temp", value=b"data")

# After 5 minutes this entry is automatically evicted.

Lowering default_ttl tailors the fast cache for short-lived sessions or environments with high memory pressure. You can tune this parameter per instance without changing global configuration.

Observing Cache Occupancy

from headroom.transforms.content_router import ContentRouter, ContentRouterConfig

router = ContentRouter(ContentRouterConfig())
print("Fast-TTL entries:", router.fast_cache.size())
print("LRU entries:", router.lru_cache.size())

These accessors let operators verify cache utilization alongside Prometheus metrics that surface TTL bucket statistics. Monitoring both tiers helps identify whether adjustments to size limits or TTL values are needed.

Summary

Headroom's two-tier CompressionCache with TTL pairs a fast time-bound cache with a long-lived LRU store to minimize redundant compression.
The fast tier defaults to a 30-minute TTL and lives in headroom/cache/compression_cache.py via the CompressionStore class.
The LRU tier provides extended caching for hot data without letting memory grow unbounded, evicting entries only when capacity is reached.
The ContentRouter in headroom/transforms/content_router.py orchestrates lookups across both layers before falling back to the compressor.
Operators can customize the TTL via CompressionStore(default_ttl=...) and observe cache state through runtime size checks.

Frequently Asked Questions

What is the default TTL for Headroom's CompressionCache?

The fast-TTL cache defaults to 1800 seconds (30 minutes). This value is defined in headroom/config.py and can be overridden by passing a different default_ttl to the CompressionStore constructor.

How does the long-lived LRU cache differ from the fast-TTL tier?

The fast-TTL tier evicts entries strictly based on age, ensuring recent results are served instantly but purged after the TTL expires. The LRU tier evicts entries based on access patterns and a maximum size limit, allowing frequently used content to remain cached indefinitely while rarely used items are dropped.

Where is the two-tier cache wired into the compression pipeline?

The orchestration happens in headroom/transforms/content_router.py near line 192. The ContentRouter checks the fast-TTL store first, then the LRU store, and only executes the expensive compression logic if neither tier contains a valid entry.

Why use a two-tier design instead of a single cache?

A single cache forces a trade-off between freshness and hit rate. By splitting responsibilities, Headroom gains immediate lookups for recent work via TTL and extended retention for popular content via LRU, which improves throughput and caps memory usage without sacrificing speed.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how chopratejas/headroom works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →