# How the Two-Tier CompressionCache with TTL Enhances Performance in Headroom

> Headroom's two tier CompressionCache with TTL boosts performance by caching recent payloads and using an LRU store for efficiency, ensuring sub-10 ms latency and limiting memory growth.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: performance
- Published: 2026-06-07

---

**Headroom's two-tier CompressionCache uses a fast, TTL-bound cache for recent payloads and a long-lived LRU store to eliminate redundant recompression, limit memory growth, and maintain sub-10 ms latency for cached items.**

The `chopratejas/headroom` repository implements a memory-efficient compression pipeline that relies on a **two-tier CompressionCache with TTL** to avoid repeating expensive compression work. By pairing a time-bound fast cache with a size-bound secondary store, the system serves frequently accessed content instantly while preventing unbounded memory growth.

## How the Two-Tier CompressionCache with TTL Works

The caching strategy splits responsibility across two distinct layers. The fast tier holds the most recent results under a strict expiration policy, while the secondary tier retains popular items for longer periods through size-based eviction.

### Fast-TTL Cache Tier

The fast-TTL cache is implemented by the `CompressionStore` class in [`headroom/cache/compression_cache.py`](https://github.com/chopratejas/headroom/blob/main/headroom/cache/compression_cache.py). Each entry receives an expiry timestamp based on a configurable TTL that defaults to **30 minutes** (1800 seconds), ensuring that identical content requested within that window is returned without re-invoking the compressors. When the TTL elapses, the entry is automatically considered stale and removed.

### Long-Lived LRU Cache Tier

Sitting behind the fast cache is a **least-recently-used (LRU)** store bounded by a maximum entry count rather than age. Frequently accessed items survive beyond the fast tier's expiration because they are promoted to this larger LRU cache, while rarely used entries are evicted once the cache reaches its limit. This secondary layer lives in the same [`headroom/cache/compression_cache.py`](https://github.com/chopratejas/headroom/blob/main/headroom/cache/compression_cache.py) module that provides the `CompressionStore`.

## Performance Benefits of the Two-Tier CompressionCache with TTL

Both tiers work together to deliver measurable throughput and latency improvements. The design directly addresses four critical operational goals:

- **Reduced recompression** — When the same text or code appears repeatedly, the fast-TTL cache returns the pre-compressed representation instantly, bypassing the costly compressor pipeline entirely.
- **TTL-driven memory control** — The 30-minute TTL guarantees the fast tier never accumulates stale entries indefinitely; outdated items are automatically purged, freeing memory for newer content.
- **LRU tier for hot data** — Items accessed often enough are promoted to the LRU store, allowing hot content to remain cached for hours or days without repeated compression overhead.
- **Predictable latency** — Serving hits from the fast tier keeps per-message latency under **10 ms**, even when the system is under heavy load.

## Source Code Implementation in Headroom

The cache wiring is orchestrated in [`headroom/transforms/content_router.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/content_router.py), where the `ContentRouter` docstring around line 192 explicitly describes the two-tier architecture with TTL. The router coordinates lookups across both tiers before falling back to actual compression.

The fast-TTL storage layer is provided by the `CompressionStore` class in [`headroom/cache/compression_cache.py`](https://github.com/chopratejas/headroom/blob/main/headroom/cache/compression_cache.py). It records an expiry timestamp for every stored payload and rejects stale results on retrieval. Default values for the TTL and LRU size limits are centralized in [`headroom/config.py`](https://github.com/chopratejas/headroom/blob/main/headroom/config.py) near line 384.

Validation of TTL expiration and cache behavior can be found in [`tests/test_transforms_content_router.py`](https://github.com/chopratejas/headroom/blob/main/tests/test_transforms_content_router.py), which exercises the integration between the router and both cache layers.

## Practical Code Examples for the Two-Tier CompressionCache

### Accessing the Fast-TTL Cache Directly

```python
from headroom.cache.compression_cache import CompressionStore

# Create a store with the default 30-minute TTL

fast_cache = CompressionStore()

# Store a compressed payload; the TTL is applied automatically

fast_cache.store(key="my_content_hash", value=b"compressed_bytes")

# Retrieve later — returns None if the TTL has expired

cached = fast_cache.get("my_content_hash")
if cached:
    print("Cache hit:", cached.value)
else:
    print("Cache miss or TTL expired")

```

The `CompressionStore` class records an expiry timestamp derived from its `default_ttl=1800` seconds parameter. This fast tier is the first line of defense against redundant compression work.

### Using ContentRouter with Both Cache Tiers

```python
from headroom.transforms.content_router import ContentRouter, ContentRouterConfig

# Build a router with default settings (fast-TTL 30 min, LRU size 10 000)

router = ContentRouter(ContentRouterConfig())

# Compress content; the router checks the fast TTL cache, falls back to

# the LRU store, and finally runs the compressors if needed.

compressed = router.apply(
    content="def hello():\n    print('world')",
    metadata={}
)

print("Compressed size:", len(compressed.data))

```

Internally, the router calls `fast_cache.get()` first. If the fast tier misses, it queries the LRU store before invoking the heavy compression modules.

### Adjusting the TTL at Runtime

```python
from headroom.cache.compression_cache import CompressionStore

# Override the TTL for a specific use case (e.g., 5 minutes)

short_ttl_cache = CompressionStore(default_ttl=300)

short_ttl_cache.store(key="temp", value=b"data")

# After 5 minutes this entry is automatically evicted.

```

Lowering `default_ttl` tailors the fast cache for short-lived sessions or environments with high memory pressure. You can tune this parameter per instance without changing global configuration.

### Observing Cache Occupancy

```python
from headroom.transforms.content_router import ContentRouter, ContentRouterConfig

router = ContentRouter(ContentRouterConfig())
print("Fast-TTL entries:", router.fast_cache.size())
print("LRU entries:", router.lru_cache.size())

```

These accessors let operators verify cache utilization alongside Prometheus metrics that surface TTL bucket statistics. Monitoring both tiers helps identify whether adjustments to size limits or TTL values are needed.

## Summary

- Headroom's **two-tier CompressionCache with TTL** pairs a fast time-bound cache with a long-lived LRU store to minimize redundant compression.
- The fast tier defaults to a **30-minute TTL** and lives in [`headroom/cache/compression_cache.py`](https://github.com/chopratejas/headroom/blob/main/headroom/cache/compression_cache.py) via the `CompressionStore` class.
- The LRU tier provides extended caching for hot data without letting memory grow unbounded, evicting entries only when capacity is reached.
- The `ContentRouter` in [`headroom/transforms/content_router.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/content_router.py) orchestrates lookups across both layers before falling back to the compressor.
- Operators can customize the TTL via `CompressionStore(default_ttl=...)` and observe cache state through runtime size checks.

## Frequently Asked Questions

### What is the default TTL for Headroom's CompressionCache?

The fast-TTL cache defaults to **1800 seconds** (30 minutes). This value is defined in [`headroom/config.py`](https://github.com/chopratejas/headroom/blob/main/headroom/config.py) and can be overridden by passing a different `default_ttl` to the `CompressionStore` constructor.

### How does the long-lived LRU cache differ from the fast-TTL tier?

The fast-TTL tier evicts entries strictly based on age, ensuring recent results are served instantly but purged after the TTL expires. The LRU tier evicts entries based on access patterns and a maximum size limit, allowing frequently used content to remain cached indefinitely while rarely used items are dropped.

### Where is the two-tier cache wired into the compression pipeline?

The orchestration happens in [`headroom/transforms/content_router.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/content_router.py) near line 192. The `ContentRouter` checks the fast-TTL store first, then the LRU store, and only executes the expensive compression logic if neither tier contains a valid entry.

### Why use a two-tier design instead of a single cache?

A single cache forces a trade-off between freshness and hit rate. By splitting responsibilities, Headroom gains immediate lookups for recent work via TTL and extended retention for popular content via LRU, which improves throughput and caps memory usage without sacrificing speed.