TransformPipeline Architecture in Headroom: How to Customize the LLM Compression Pipeline
The Headroom TransformPipeline processes every LLM request through a deterministic sequence of composable transforms that reduce token usage while preserving critical information, with each stage implementing a common Transform protocol to produce a TransformResult.
The chopratejas/headroom library implements a sophisticated compression system that sits between your application and LLM providers. Understanding the TransformPipeline architecture allows you to balance cost reduction against context preservation by rearranging, configuring, or extending the built-in transform stages.
Core Pipeline Architecture
The pipeline architecture centers on a simple but powerful abstraction: each transform is a stateless class that implements an apply(messages) method returning a TransformResult. The orchestration logic in headroom/transforms/pipeline.py executes these transforms sequentially, passing the output of one stage as input to the next.
The Transform Protocol
Every transform conforms to a consistent interface defined in the pipeline module. As implemented in headroom/transforms/pipeline.py, the TransformPipeline accepts a Python list of transform instances and invokes them in order:
from headroom import TransformPipeline, CacheAligner, SmartCrusher, RollingWindow
pipeline = TransformPipeline([
CacheAligner(), # Stabilize cache-friendly prefix
SmartCrusher(), # Compress JSON tool outputs
RollingWindow(), # Enforce token budget
])
result = pipeline.transform(messages)
print(f"Saved {result.tokens_saved} tokens")
Each transform receives the message list, applies its specific compression logic, and returns a TransformResult containing the modified messages and metadata about tokens saved.
Stage Execution Order
The order of transforms matters significantly. The recommended architecture places cache stabilization first, followed by content-specific compression, optional ML-based compression, and finally token limit enforcement. Reordering—for example, placing CacheAligner after SmartCrusher—would waste opportunities to stabilize the system prompt for provider-side caching.
Built-in Transform Stages
The Headroom repository provides five distinct transform classes, each targeting a specific source of token waste.
CacheAligner
Located in headroom/transforms/cache_aligner.py, the CacheAligner detects and extracts dynamic content (such as dates, UUIDs, or timestamps) from the system prompt. It moves these volatile elements to a dynamic suffix, keeping the static prefix stable. This enables provider-side caching (OpenAI, Anthropic, Google) to hit repeatedly, saving up to approximately 90% of request costs for identical prompt prefixes.
SmartCrusher
The SmartCrusher transform in headroom/transforms/smart_crusher.py statistically analyzes JSON-like tool outputs. Rather than truncating arbitrarily, it intelligently preserves the most valuable items: first and last entries, error items, statistical outliers, query-relevant items, and change points. This approach cuts huge tool-result payloads by 70–95% while guaranteeing that spikes, errors, and other critical data survive the compression.
ContentRouter
Found in headroom/transforms/content_router.py, the optional ContentRouter acts as a smart dispatcher. It examines the remaining content’s type—whether code, logs, or plain text—and automatically routes it to the best compressor. It can invoke CodeAwareCompressor, LogCompressor, or the optional Kompress ML compressor if installed. Enable specific routers through configuration flags such as enable_code_aware and enable_log_compression.
RollingWindow
The RollingWindow transform in headroom/transforms/rolling_window.py enforces the model’s token limit through deterministic truncation. It drops whole tool-call/result pairs starting from the oldest, while always preserving the system message and the most recent user/assistant turns. This guarantees the final payload fits the context window without breaking tool-call ordering dependencies.
IntelligentContextManager
An advanced option in headroom/transforms/intelligent_context.py, the IntelligentContextManager scores each message on recency, semantic similarity, TOIN-learned importance, error detection, forward-references, and token density. Instead of dropping merely the oldest messages, it removes the lowest-scored messages. This retains semantically important content—such as an early error message—even when it appears far from the tail of the conversation.
How to Customize the TransformPipeline
The pipeline architecture supports three primary customization strategies: reordering stages, tuning configurations, and injecting optional compressors.
Reordering and Selecting Transforms
Customize the pipeline by modifying the list passed to TransformPipeline. You can omit stages that don't apply to your workload or reorder them to change processing priority:
# Minimal pipeline without ContentRouter or IntelligentContextManager
pipeline = TransformPipeline([
CacheAligner(),
RollingWindow(),
])
Configuration Tuning
Each transform exposes a configuration dataclass (e.g., SmartCrusherConfig, CacheAlignerConfig, RollingWindowConfig). Pass these when constructing transforms to tune behavior:
from headroom import SmartCrusherConfig, CacheAlignerConfig, RollingWindowConfig
pipeline = TransformPipeline([
CacheAligner(config=CacheAlignerConfig(
extract_dates=True,
stable_prefix_min_tokens=120
)),
SmartCrusher(config=SmartCrusherConfig(
min_tokens_to_crush=150,
keep_first=5
)),
RollingWindow(config=RollingWindowConfig(
max_tokens=100_000,
preserve_recent_turns=6
)),
])
Adding ML-Based Compression
To include the ML-based KompressCompressor, install the optional dependency and import from headroom/transforms/kompress_compressor.py:
pip install "headroom-ai[ml]"
from headroom.transforms.kompress_compressor import KompressCompressor
pipeline = TransformPipeline([
CacheAligner(),
SmartCrusher(),
KompressCompressor(), # ML-based text compression
RollingWindow(),
])
Summary
- The TransformPipeline in
headroom/transforms/pipeline.pyorchestrates a sequence of composable transforms that each implement anapply(messages)method. - CacheAligner stabilizes the prompt prefix for provider caching, while SmartCrusher compresses JSON tool outputs by 70–95%.
- ContentRouter automatically selects the best compression strategy based on content type, and RollingWindow enforces hard token limits.
- Customize the pipeline by reordering the transform list, passing configuration dataclasses to individual transforms, or injecting the optional KompressCompressor from the
[ml]extra. - All source files reside under
headroom/transforms/in thechopratejas/headroomrepository.
Frequently Asked Questions
How does the TransformPipeline maintain safety guarantees when compressing context?
The pipeline maintains safety through deterministic, rules-based transforms rather than black-box compression. Each stage—whether CacheAligner or SmartCrusher—uses specific heuristics (error detection, change-point analysis, token density scoring) to ensure critical information like errors, outliers, and recent turns survive the compression process.
Can I use the TransformPipeline without installing the ML dependencies?
Yes. The KompressCompressor located in headroom/transforms/kompress_compressor.py is entirely optional. The core pipeline consisting of CacheAligner, SmartCrusher, and RollingWindow functions without any ML libraries. Only import KompressCompressor if you have installed the [ml] extra via pip install "headroom-ai[ml]".
What happens if I place RollingWindow before SmartCrusher in the pipeline?
Placing RollingWindow before SmartCrusher would cause the system to truncate the message list based on age before analyzing JSON content for compressibility. You would lose the opportunity to score and preserve important items within large tool results, potentially dropping critical error messages that SmartCrusher would have identified and retained.
Where are the configuration classes defined for each transform?
Configuration dataclasses such as SmartCrusherConfig, CacheAlignerConfig, and RollingWindowConfig are defined alongside their respective transform implementations in headroom/transforms/smart_crusher.py, headroom/transforms/cache_aligner.py, and headroom/transforms/rolling_window.py. Import them directly from the headroom package namespace as shown in the customization examples.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →