# What Is CacheAligner and How It Improves Provider KV Cache Hit Rates

> Discover CacheAligner, a Headroom pipeline transform that boosts KV cache hit rates by creating static prefixes from dynamic prompt fragments. Learn how it improves performance.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: deep-dive
- Published: 2026-06-07

---

**CacheAligner is Headroom’s first pipeline transform that extracts dynamic fragments—such as dates, UUIDs, and tokens—from the prompt prefix and appends them to the end, creating a static byte-identical prefix that dramatically increases provider KV-cache hit rates.**

Headroom is an open-source LLM proxy designed to reduce token costs and latency through prompt optimization. Its **CacheAligner** transform solves a specific but expensive problem: LLM providers only reuse key-value (KV) caches when the incoming request prefix is byte-identical to a previous one. Because CacheAligner normalizes the system prompt before it reaches the provider, it turns repetitive cache misses into hits and directly lowers latency and cost.

## How Provider KV Caching Works

Leading LLM providers—OpenAI, Anthropic, and Google—maintain a **KV cache** of recent prompts so that identical requests can skip forward pass execution. However, this cache is strictly **byte-identical**: if any part of the prompt changes, even a dynamic timestamp or request ID, the provider treats the entire request as a cache miss. For applications that embed the current date or a unique identifier in every system prompt, this behavior forces redundant model computation and increases token spend.

## The CacheAligner Transformation

In [`headroom/transforms/pipeline.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/pipeline.py), **CacheAligner** is wired as the very first transform in the ordered Headroom pipeline. It rewrites the prompt in four discrete steps so that the provider sees the same static prefix on every call.

### Step 1: Detect

CacheAligner scans the system prompt for volatile patterns such as dates, UUIDs, and dynamic tokens.

### Step 2: Extract

It removes those dynamic fragments from their original position in the prefix.

### Step 3: Append

The extracted content is repositioned into a trailing “context” block at the end of the message.

### Step 4: Emit

The transform outputs the cleaned-up prefix alongside the trailing context, producing a stable byte-string for the provider to cache.

According to the architecture documentation in [`wiki/ARCHITECTURE.md`](https://github.com/chopratejas/headroom/blob/main/wiki/ARCHITECTURE.md), this normalization is the foundation of Headroom’s cache-optimization strategy.

## Core Implementation and Source Files

The transform’s logic spans Rust and Python layers:

- [`headroom/transforms/cache_aligner.rs`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/cache_aligner.rs) — Implements the high-performance Rust core that identifies and repositions dynamic fragments.
- [`headroom/transforms/pipeline.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/pipeline.py) — Orchestrates the ordered transforms and guarantees CacheAligner runs first.
- [`wiki/ARCHITECTURE.md`](https://github.com/chopratejas/headroom/blob/main/wiki/ARCHITECTURE.md) — Documents the transform’s role in the Headroom pipeline.
- [`wiki/configuration.md`](https://github.com/chopratejas/headroom/blob/main/wiki/configuration.md) — Exposes flags to enable or disable CacheAligner.

You can verify its effect by inspecting cache hit metrics before and after enabling the transform in your Headroom client configuration.

## Code Examples

The following examples show how CacheAligner rewrites prompts automatically when you use the Headroom client, and how to invoke it manually for debugging.

### Automatic Prompt Rewriting via HeadroomClient

```python

# Before Cache Aligner – each request includes today’s date

messages = [
    {"role": "system",
     "content": f"You are a helpful assistant. Today is {date.today()}"}
]

# After applying Headroom (Cache Aligner enabled)

from headroom import HeadroomClient, OpenAIProvider
from openai import OpenAI

base = OpenAI(api_key="sk-...")
client = HeadroomClient(
    original_client=base,
    provider=OpenAIProvider(),
    # cache_aligner = True by default in the pipeline

)

# The transform rewrites the system prompt to:

#   "You are a helpful assistant."

#   "[Context: Today is 2024‑12‑15]"

# so the prefix is static → cache-friendly.

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
)

```

### Manual CacheAligner Invocation

```python

# Manual use of Cache Aligner (rare, for debugging)

from headroom.transforms.cache_aligner import CacheAligner

raw_prompt = "You are a bot. Current time: 2024‑12‑15 14:22"
aligned_prompt, dynamic_tail = CacheAligner().align(raw_prompt)

print(aligned_prompt)   # "You are a bot."

print(dynamic_tail)     # "Current time: 2024‑12‑15 14:22"

# The aligned_prompt can now be cached efficiently.

```

## Provider-Specific Cache Savings

CacheAligner delivers different levels of impact depending on the provider’s native caching mechanism:

| Provider | Cache optimisation technique | Typical savings |
|----------|------------------------------|-----------------|
| OpenAI   | Prefix alignment (Cache Aligner) | ~50% |
| Anthropic| `cache_control` blocks (no change needed) | ~90% |
| Google   | CachedContent API (Cache Aligner helps) | ~75% |

For **OpenAI**, CacheAligner is the primary mechanism for prefix alignment. **Anthropic** already achieves high hit rates through native `cache_control` blocks, so CacheAligner is not required. **Google** providers benefit when CacheAligner prepares a static prefix before the request enters the CachedContent API.

## Summary

- **CacheAligner** is the first transform in the Headroom pipeline and is enabled by default.
- It stabilises the prompt prefix by detecting and extracting dynamic content, then appending it to the end.
- The resulting byte-identical prefix maximises KV-cache hit rates for providers that rely on exact prefix matching.
- Source code in [`headroom/transforms/cache_aligner.rs`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/cache_aligner.rs) and [`headroom/transforms/pipeline.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/pipeline.py) implements the detection, extraction, and reordering logic.
- Typical savings reach approximately **50%** for OpenAI and **75%** for Google when the transform is active.

## Frequently Asked Questions

### What types of dynamic content does CacheAligner detect?

CacheAligner detects patterns such as dates, UUIDs, and dynamic tokens that appear inside the system prompt. It targets the fragments most likely to change between consecutive requests and break byte-identical caching.

### Is CacheAligner enabled by default in Headroom?

Yes. As implemented in [`headroom/transforms/pipeline.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/pipeline.py), CacheAligner runs as the first step in the default pipeline. You can override this behavior through the settings documented in [`wiki/configuration.md`](https://github.com/chopratejas/headroom/blob/main/wiki/configuration.md).

### How does CacheAligner differ from Anthropic's native `cache_control`?

Anthropic’s `cache_control` blocks achieve roughly **90%** cache savings without modifying prompt structure, whereas CacheAligner is a prompt-level transform designed for providers like OpenAI and Google that do not offer equivalent native prefix caching. CacheAligner rewrites the prompt itself to achieve similar byte-identical guarantees.

### Can I use CacheAligner outside the Headroom pipeline?

Yes, though it is rare. You can import `CacheAligner` directly from `headroom.transforms.cache_aligner` and call the `.align(raw_prompt)` method to receive the static prefix and dynamic tail separately for debugging or custom integrations.