# How CacheAligner Improves Provider KV Cache Hit Rates in Headroom

> Discover how CacheAligner boosts KV cache hit rates in Headroom by identifying volatile tokens like timestamps and JWTs, ensuring stable KV cache prefixes for better performance. Optimize your system prompts.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: deep-dive
- Published: 2026-06-09

---

**CacheAligner improves provider KV cache hit rates by detecting volatile tokens—such as timestamps, UUIDs, and JWTs—in system prompts before they invalidate the prefix cache, enabling developers to isolate dynamic content and maintain stable KV cache prefixes across requests.**

Headroom is an open-source Python framework designed to optimize LLM request pipelines. At its core, the `CacheAligner` transform safeguards the **system-prompt prefix**, the content segment that providers like OpenAI, Anthropic, and Google cache as key-value (KV) vectors. By identifying patterns that trigger cache invalidation, `CacheAligner` helps maintain prefix stability, ensuring providers can reuse computed KV caches and reducing both latency and token costs.

## Detecting Volatile Content in System Messages

The `CacheAligner` transform operates as a **detector-only** component that scans system messages for tokens guaranteed to change between requests. According to the source code in [`headroom/transforms/cache_aligner.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/cache_aligner.py) (lines 76-152), it implements structural parsers to identify:

- **UUIDs** via `_is_uuid` pattern matching
- **ISO-8601 timestamps** using `_is_iso8601` and `datetime.fromisoformat` validation
- **JWT-shaped tokens** detected by `_is_jwt_shape`
- **Hexadecimal hashes** identified through `_is_hex_hash` and base-64 decoding checks

When any of these patterns appear in a system message, the transform logs a warning such as `CacheAligner: detected volatile content … cache prefix unstable` and appends it to `TransformResult.warnings`. This visibility allows developers to recognize which prompt elements are dynamic and likely to cause **cache misses** by altering the provider's cached prefix hash.

## Computing Stable Prefix Hashes

To enable precise cache tracking, `CacheAligner` computes a deterministic fingerprint of the system prompt content. In [`headroom/transforms/cache_aligner.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/cache_aligner.py) (lines 12-22 and 28-33), the transform:

1. Concatenates all system-message contents
2. Generates a short hash using `compute_short_hash` (sourced from [`headroom/utils.py`](https://github.com/chopratejas/headroom/blob/main/headroom/utils.py))
3. Stores this value as `CachePrefixMetrics.stable_prefix_hash`

The transform maintains state across pipeline instances via `self._previous_prefix_hash`. When the current hash differs from the previous request, `CachePrefixMetrics.prefix_changed` is set to `True`, signaling to downstream telemetry that the provider's KV cache must be rebuilt. Providers only reuse KV caches when the prefix hash remains **identical**, making this tracking essential for hit-rate optimization.

## Monitoring Cache Health with Alignment Scores

Beyond detection and hashing, `CacheAligner` provides quantitative metrics for observability. The `get_alignment_score(messages)` method (implemented in [`headroom/transforms/cache_aligner.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/cache_aligner.py), lines 48-65) calculates a **0-100 stability score** by applying a -10 penalty per volatile finding, clamped to the valid range.

This score enables automated monitoring: a value below 70 typically indicates high volatility and predicts cache misses. The score is exposed on the response object alongside `cache_metrics`, allowing integration with dashboards that track provider cache efficiency over time.

## Immutable Transform Guarantees

Crucially, `CacheAligner` never mutates the original message array. Per the implementation in [`headroom/transforms/cache_aligner.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/cache_aligner.py), the transform returns `result_messages = deep_copy_messages(messages)`, ensuring the invariant that system prompts remain unmodified. This safety guarantee prevents accidental cache corruption while still providing the diagnostic data needed to refactor prompts externally.

## Enabling and Configuring CacheAligner

By default, the transform is disabled. To activate detection, set `CacheAlignerConfig.enabled = True` in [`headroom/config.py`](https://github.com/chopratejas/headroom/blob/main/headroom/config.py) (line 57). When enabled via the Headroom client pipeline, `CacheAligner` runs automatically on every request, populating `cache_alignment_score` and `cache_metrics` without requiring manual invocation.

## Practical Implementation Examples

### Basic Pipeline Usage

The `align_for_cache` helper function provides direct access to cache detection without full client initialization:

```python
from headroom.transforms.cache_aligner import align_for_cache
from headroom.config import CacheAlignerConfig

cfg = CacheAlignerConfig(enabled=True)

messages = [
    {"role": "system", "content": "Assistant. Current time: 2024-06-09T12:34:56Z"},
    {"role": "user", "content": "What is the weather?"}
]

aligned_messages, prefix_hash = align_for_cache(messages, cfg)

print("Stable prefix hash:", prefix_hash)  # e.g., "a3f9b2..."

# aligned_messages is identical to input messages (deep copied)

```

This example triggers a warning about the ISO-8601 timestamp while returning the stable hash for cache verification.

### Integration with Headroom Client

For production use, integrate via the `HeadroomClient` to automatically populate response metrics:

```python
from headroom.client import HeadroomClient
from headroom.config import HeadroomConfig

cfg = HeadroomConfig()
cfg.cache_aligner.enabled = True  # Enable detection globally

client = HeadroomClient(cfg)
response = client.chat(messages)

# Access cache diagnostics

print(response.cache_metrics.stable_prefix_hash)
print(response.cache_metrics.prefix_changed)
print(response.cache_alignment_score)

```

### Monitoring with Alignment Scores

Use the alignment score to trigger alerts when prompts threaten cache stability:

```python
if response.cache_alignment_score < 70:
    logger.warning(
        "Volatile system prompt detected (score: %d). "
        "Consider moving timestamps to user messages to improve KV cache hit rates.",
        response.cache_alignment_score
    )

```

## Summary

- **Detection**: `CacheAligner` identifies UUIDs, timestamps, JWTs, and hex hashes in [`headroom/transforms/cache_aligner.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/cache_aligner.py) (lines 76-152) that invalidate provider KV caches.
- **Hashing**: Computes stable prefix hashes using `compute_short_hash` to track cache state changes across requests.
- **Observability**: Exposes volatility via `TransformResult.warnings`, `CachePrefixMetrics.prefix_changed`, and a 0-100 alignment score.
- **Safety**: Returns deep-copied messages via `deep_copy_messages`, guaranteeing immutable system prompts.
- **Configuration**: Controlled by `CacheAlignerConfig.enabled` in [`headroom/config.py`](https://github.com/chopratejas/headroom/blob/main/headroom/config.py) (line 57), defaulting to off.

## Frequently Asked Questions

### Does CacheAligner modify my system prompts to fix cache issues?

No. `CacheAligner` is a **detector-only** transform that returns an unchanged deep copy of your messages. It identifies volatile content through functions like `_is_uuid` and `_is_iso8601` in [`headroom/transforms/cache_aligner.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/cache_aligner.py) but leaves modification to the developer. This design ensures that prompts remain exactly as written while providing the data needed to refactor dynamic elements into user turns.

### How does CacheAligner differ from built-in provider caching?

Provider-level KV caches (OpenAI, Anthropic, Google) automatically cache identical prefixes, but they invalidate entirely when **any** token in the system prompt changes. `CacheAligner` adds a **detection layer** that runs before the provider sees the request, identifying which specific tokens would cause invalidation. It computes stable hashes and alignment scores that providers do not expose, giving Headroom users actionable insight into cache stability.

### What specific patterns does CacheAligner detect?

According to the source analysis of [`headroom/transforms/cache_aligner.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/cache_aligner.py) (lines 76-152), the transform detects:

- UUID-formatted strings
- ISO-8601 datetime strings
- JWT-shaped tokens (three base64url segments)
- Hexadecimal hashes

These patterns are identified using structural validation rather than regex alone, ensuring accurate detection of truly volatile content.

### Can I use CacheAligner outside the Headroom client pipeline?

Yes. While the `HeadroomClient` automatically runs `CacheAligner` when enabled in [`headroom/config.py`](https://github.com/chopratejas/headroom/blob/main/headroom/config.py), you can invoke the transform directly via `align_for_cache(messages, config)` from [`headroom/transforms/cache_aligner.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/cache_aligner.py). This standalone usage returns the diagnostic hash and warnings without requiring full client initialization, making it suitable for custom pipeline integrations.