How Headroom's ImageCompressor Achieves Significant Token Reduction Using Its Trained ML Router
Headroom's ImageCompressor reduces LLM image tokens by up to 99% by first aligning images to the provider's token grid, then routing each image through a trained Mini-LM classifier and SigLIP visual analyzer to select the most aggressive compression technique that still answers the user's query.
The ImageCompressor class in chopratejas/headroom serves as the primary orchestrator for vision-language token optimization. It integrates a zero-loss preprocessor with a query-aware ML router to intelligently decide whether to OCR, crop, downscale, or preserve an image. Understanding how ImageCompressor achieves significant token reduction using its trained ML router reveals why the library can deliver near-complete token savings on document-heavy workloads.
Three-Stage Token Reduction Pipeline
The compressor operates as a sequential pipeline. Each stage removes redundant information before the next one runs, ensuring the ML router works on the smallest possible input.
Stage 1: Tile-Boundary Alignment (Zero Quality Loss)
Before any model inference, the compressor calls optimize_images_in_messages from headroom/image/tile_optimizer.py to align image dimensions to the LLM provider's token grid:
# headroom/image/compressor.py (lines 33-38)
from .tile_optimizer import optimize_images_in_messages
messages, tile_results = optimize_images_in_messages(messages, provider)
This step eliminates duplicate or partial tiles at image boundaries. Because modern providers like OpenAI charge per 512×512 tile, removing even a few redundant pixels can save hundreds of tokens with zero visual or semantic loss.
Stage 2: ML-Based Technique Routing
After tile optimization, the compressor delegates to headroom/image/trained_router.py. The router fuses two signals—a textual query embedding and a visual image embedding—into a single RouteDecision.
Query Classification with Mini-LM
The router loads a fine-tuned Mini-LM model (registered in headroom/models/ml_models.py) and classifies the user query into one of four techniques: TRANSCODE, CROP, FULL_LOW, or PRESERVE. In trained_router.py (lines 56-66), the inference looks like this:
inputs = self._tokenizer(query, return_tensors="pt", truncation=True,
padding=True, max_length=64)
outputs = self._classifier(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
pred_id = int(torch.argmax(probs, dim=-1).item())
confidence = probs[0][pred_id].item()
The model achieves approximately 93.7% accuracy on the training set. A high-confidence TRANSCODE prediction, for example, tells the system that the user is asking for text or structured data rather than aesthetic or spatial information.
Visual Signal Extraction with SigLIP
When use_siglip=True, the router forwards the image through a SigLIP model to extract visual signals. As implemented in trained_router.py (lines 88-106), the router computes an image embedding and compares it against pre-computed text embeddings for the concepts has_text, is_document, is_complex, and has_small_details:
image_embedding = self._get_image_embedding(image_data)
image_signals = self._analyze_image(image_embedding)
Similarity scores are squashed through a sigmoid to produce calibrated values in the range [0, 1]. These scores indicate whether the image actually contains the visual properties the query classifier assumes.
Decision Fusion and Override Logic
The router merges the Mini-LM query prediction with the SigLIP image signals to adjust confidence and apply guardrails. If the query classifier predicts TRANSCODE but SigLIP reports no text is present, the router can demote or override the technique. This fusion logic appears in trained_router.py around lines 60-78.
The final RouteDecision object contains:
technique— the chosen compression method.confidence— the combined confidence score.reason— a human-readable explanation.image_signals— SigLIP diagnostic values.query_predictionandquery_confidence— raw Mini-LM outputs.
Stage 3: Technique Application and Token Accounting
Once the router selects a technique, the compressor applies it and measures the delta. The techniques and their approximate savings are:
| Technique | Action | Approximate Token Savings |
|---|---|---|
| TRANSCODE | Runs OCR (_ocr_extract) and replaces the image with extracted text |
~99% |
| CROP | Detects a region of interest and crops before resizing | 50-90% |
| FULL_LOW | Forces detail="low" or resizes to a small JPEG |
~87% |
| PRESERVE | Keeps the image unchanged | 0% |
Token accounting happens in two phases. First, _estimate_tokens in headroom/image/compressor.py (lines 94-104) projects the pre-compression cost using the standard 85-tokens-per-512×512-tile formula. After compression, _count_result_tokens (lines 123-136) measures the actual tokens in the resulting message list. The difference is exposed through the last_savings property:
# headroom/image/compressor.py (lines 48-53)
@property
def last_savings(self) -> float:
if self.last_result:
return self.last_result.savings_percent
return 0.0
Because the router selects the most aggressive technique compatible with both the query intent and image content, the aggregate token reduction can be dramatic—especially when a high-resolution document is replaced by a few lines of OCR text.
Practical Code Examples
Basic Image Compression
The following snippet compresses OpenAI-style messages and reports the savings:
from headroom.image import ImageCompressor
msgs = [
{"role": "user", "content": [
{"type": "image_url",
"image_url": {"url": "data:image/png;base64,<...>", "detail": "high"}}
]}
]
compressor = ImageCompressor()
compressed = compressor.compress(msgs, provider="openai")
print(f"Saved {compressor.last_savings:.1f}% tokens")
Inspecting the Router Decision
You can probe the trained router directly to see why a particular technique was chosen:
router = compressor._get_router() # lazy-loads the Mini-LM + SigLIP stack
decision = router.classify(
image_data=compressor._extract_image_data(msgs),
query="Describe the diagram"
)
print("Chosen technique:", decision.technique.value)
print("Confidence:", f"{decision.confidence:.2%}")
print("Reason:", decision.reason)
Using the Convenience Helper
For one-off compression without manually managing the compressor lifecycle, use compress_images:
from headroom.image import compress_images
compressed = compress_images(msgs, provider="anthropic")
# Creates a temporary ImageCompressor, runs compress, and closes models automatically.
Key Source Files
| File | Purpose |
|---|---|
headroom/image/compressor.py |
Main orchestration, token estimation (_estimate_tokens), OCR fallback, technique application |
headroom/image/trained_router.py |
Trained Mini-LM + SigLIP router that decides the compression technique |
headroom/image/tile_optimizer.py |
Zero-loss tile-boundary alignment preprocessor |
headroom/models/ml_models.py |
Centralized model registry for loading the Mini-LM and SigLIP checkpoints |
headroom/models/config.py |
Default model IDs (technique_router, siglip) consumed by the router |
Summary
- Tile-boundary alignment in
headroom/image/tile_optimizer.pyremoves redundant pixels before any ML work begins. - The trained ML router in
headroom/image/trained_router.pyfuses a fine-tuned Mini-LM query classifier with a SigLIP visual analyzer to pick the optimal technique. - Supported techniques are
TRANSCODE,CROP,FULL_LOW, andPRESERVE, withTRANSCODEdelivering up to 99% token reduction via OCR. - Token savings are measured precisely using
_estimate_tokensand_count_result_tokens, then surfaced through thelast_savingsproperty. - The architecture ensures aggressive compression only occurs when the image content and user query agree it is safe.
Frequently Asked Questions
What model does the trained ML router use for query classification?
The router uses a fine-tuned Mini-LM transformer loaded through headroom/models/ml_models.py. It tokenizes the user query with a 64-token maximum length and outputs a probability distribution across the four compression techniques.
How accurate is the trained router at selecting compression techniques?
According to the source code in headroom/image/trained_router.py, the fine-tuned Mini-LM achieves approximately 93.7% accuracy on the training set for predicting the correct technique from the query text alone.
Can I use ImageCompressor without the SigLIP image analysis?
Yes. The use_siglip parameter controls whether the visual signal branch runs. When disabled, the router relies solely on the Mini-LM query classifier to choose the technique, though it loses the guardrails that prevent OCR on images without text.
How does the tile optimizer reduce tokens without affecting image quality?
headroom/image/tile_optimizer.py aligns image dimensions to the provider's token grid boundaries. This removes redundant partial tiles—pixels that the provider would bill as full tiles—but does not alter the visible image content, making it a lossless preprocessing step.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →