What Is CacheAligner and How It Improves Provider KV Cache Hit Rates
CacheAligner is Headroom’s first pipeline transform that extracts dynamic fragments—such as dates, UUIDs, and tokens—from the prompt prefix and appends them to the end, creating a static byte-identical prefix that dramatically increases provider KV-cache hit rates.
Headroom is an open-source LLM proxy designed to reduce token costs and latency through prompt optimization. Its CacheAligner transform solves a specific but expensive problem: LLM providers only reuse key-value (KV) caches when the incoming request prefix is byte-identical to a previous one. Because CacheAligner normalizes the system prompt before it reaches the provider, it turns repetitive cache misses into hits and directly lowers latency and cost.
How Provider KV Caching Works
Leading LLM providers—OpenAI, Anthropic, and Google—maintain a KV cache of recent prompts so that identical requests can skip forward pass execution. However, this cache is strictly byte-identical: if any part of the prompt changes, even a dynamic timestamp or request ID, the provider treats the entire request as a cache miss. For applications that embed the current date or a unique identifier in every system prompt, this behavior forces redundant model computation and increases token spend.
The CacheAligner Transformation
In headroom/transforms/pipeline.py, CacheAligner is wired as the very first transform in the ordered Headroom pipeline. It rewrites the prompt in four discrete steps so that the provider sees the same static prefix on every call.
Step 1: Detect
CacheAligner scans the system prompt for volatile patterns such as dates, UUIDs, and dynamic tokens.
Step 2: Extract
It removes those dynamic fragments from their original position in the prefix.
Step 3: Append
The extracted content is repositioned into a trailing “context” block at the end of the message.
Step 4: Emit
The transform outputs the cleaned-up prefix alongside the trailing context, producing a stable byte-string for the provider to cache.
According to the architecture documentation in wiki/ARCHITECTURE.md, this normalization is the foundation of Headroom’s cache-optimization strategy.
Core Implementation and Source Files
The transform’s logic spans Rust and Python layers:
headroom/transforms/cache_aligner.rs— Implements the high-performance Rust core that identifies and repositions dynamic fragments.headroom/transforms/pipeline.py— Orchestrates the ordered transforms and guarantees CacheAligner runs first.wiki/ARCHITECTURE.md— Documents the transform’s role in the Headroom pipeline.wiki/configuration.md— Exposes flags to enable or disable CacheAligner.
You can verify its effect by inspecting cache hit metrics before and after enabling the transform in your Headroom client configuration.
Code Examples
The following examples show how CacheAligner rewrites prompts automatically when you use the Headroom client, and how to invoke it manually for debugging.
Automatic Prompt Rewriting via HeadroomClient
# Before Cache Aligner – each request includes today’s date
messages = [
{"role": "system",
"content": f"You are a helpful assistant. Today is {date.today()}"}
]
# After applying Headroom (Cache Aligner enabled)
from headroom import HeadroomClient, OpenAIProvider
from openai import OpenAI
base = OpenAI(api_key="sk-...")
client = HeadroomClient(
original_client=base,
provider=OpenAIProvider(),
# cache_aligner = True by default in the pipeline
)
# The transform rewrites the system prompt to:
# "You are a helpful assistant."
# "[Context: Today is 2024‑12‑15]"
# so the prefix is static → cache-friendly.
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
)
Manual CacheAligner Invocation
# Manual use of Cache Aligner (rare, for debugging)
from headroom.transforms.cache_aligner import CacheAligner
raw_prompt = "You are a bot. Current time: 2024‑12‑15 14:22"
aligned_prompt, dynamic_tail = CacheAligner().align(raw_prompt)
print(aligned_prompt) # "You are a bot."
print(dynamic_tail) # "Current time: 2024‑12‑15 14:22"
# The aligned_prompt can now be cached efficiently.
Provider-Specific Cache Savings
CacheAligner delivers different levels of impact depending on the provider’s native caching mechanism:
| Provider | Cache optimisation technique | Typical savings |
|---|---|---|
| OpenAI | Prefix alignment (Cache Aligner) | ~50% |
| Anthropic | cache_control blocks (no change needed) |
~90% |
| CachedContent API (Cache Aligner helps) | ~75% |
For OpenAI, CacheAligner is the primary mechanism for prefix alignment. Anthropic already achieves high hit rates through native cache_control blocks, so CacheAligner is not required. Google providers benefit when CacheAligner prepares a static prefix before the request enters the CachedContent API.
Summary
- CacheAligner is the first transform in the Headroom pipeline and is enabled by default.
- It stabilises the prompt prefix by detecting and extracting dynamic content, then appending it to the end.
- The resulting byte-identical prefix maximises KV-cache hit rates for providers that rely on exact prefix matching.
- Source code in
headroom/transforms/cache_aligner.rsandheadroom/transforms/pipeline.pyimplements the detection, extraction, and reordering logic. - Typical savings reach approximately 50% for OpenAI and 75% for Google when the transform is active.
Frequently Asked Questions
What types of dynamic content does CacheAligner detect?
CacheAligner detects patterns such as dates, UUIDs, and dynamic tokens that appear inside the system prompt. It targets the fragments most likely to change between consecutive requests and break byte-identical caching.
Is CacheAligner enabled by default in Headroom?
Yes. As implemented in headroom/transforms/pipeline.py, CacheAligner runs as the first step in the default pipeline. You can override this behavior through the settings documented in wiki/configuration.md.
How does CacheAligner differ from Anthropic's native cache_control?
Anthropic’s cache_control blocks achieve roughly 90% cache savings without modifying prompt structure, whereas CacheAligner is a prompt-level transform designed for providers like OpenAI and Google that do not offer equivalent native prefix caching. CacheAligner rewrites the prompt itself to achieve similar byte-identical guarantees.
Can I use CacheAligner outside the Headroom pipeline?
Yes, though it is rare. You can import CacheAligner directly from headroom.transforms.cache_aligner and call the .align(raw_prompt) method to receive the static prefix and dynamic tail separately for debugging or custom integrations.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →