# How Headroom's ImageCompressor Achieves Significant Token Reduction Using Its Trained ML Router

> Learn how Headroom's ImageCompressor achieves dramatic token reduction with a trained ML router and advanced algorithms, cutting LLM image tokens by up to 99% while preserving query relevance.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: deep-dive
- Published: 2026-06-07

---

**Headroom's `ImageCompressor` reduces LLM image tokens by up to 99% by first aligning images to the provider's token grid, then routing each image through a trained Mini-LM classifier and SigLIP visual analyzer to select the most aggressive compression technique that still answers the user's query.**

The `ImageCompressor` class in `chopratejas/headroom` serves as the primary orchestrator for vision-language token optimization. It integrates a zero-loss preprocessor with a query-aware ML router to intelligently decide whether to OCR, crop, downscale, or preserve an image. Understanding how `ImageCompressor` achieves significant token reduction using its trained ML router reveals why the library can deliver near-complete token savings on document-heavy workloads.

## Three-Stage Token Reduction Pipeline

The compressor operates as a sequential pipeline. Each stage removes redundant information before the next one runs, ensuring the ML router works on the smallest possible input.

### Stage 1: Tile-Boundary Alignment (Zero Quality Loss)

Before any model inference, the compressor calls `optimize_images_in_messages` from [`headroom/image/tile_optimizer.py`](https://github.com/chopratejas/headroom/blob/main/headroom/image/tile_optimizer.py) to align image dimensions to the LLM provider's token grid:

```python

# headroom/image/compressor.py (lines 33-38)

from .tile_optimizer import optimize_images_in_messages
messages, tile_results = optimize_images_in_messages(messages, provider)

```

This step eliminates duplicate or partial tiles at image boundaries. Because modern providers like OpenAI charge per 512×512 tile, removing even a few redundant pixels can save hundreds of tokens with zero visual or semantic loss.

### Stage 2: ML-Based Technique Routing

After tile optimization, the compressor delegates to [`headroom/image/trained_router.py`](https://github.com/chopratejas/headroom/blob/main/headroom/image/trained_router.py). The router fuses two signals—a textual query embedding and a visual image embedding—into a single `RouteDecision`.

#### Query Classification with Mini-LM

The router loads a fine-tuned **Mini-LM** model (registered in [`headroom/models/ml_models.py`](https://github.com/chopratejas/headroom/blob/main/headroom/models/ml_models.py)) and classifies the user query into one of four techniques: `TRANSCODE`, `CROP`, `FULL_LOW`, or `PRESERVE`. In [`trained_router.py`](https://github.com/chopratejas/headroom/blob/main/trained_router.py) (lines 56-66), the inference looks like this:

```python
inputs = self._tokenizer(query, return_tensors="pt", truncation=True,
                         padding=True, max_length=64)
outputs = self._classifier(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
pred_id = int(torch.argmax(probs, dim=-1).item())
confidence = probs[0][pred_id].item()

```

The model achieves approximately **93.7%** accuracy on the training set. A high-confidence `TRANSCODE` prediction, for example, tells the system that the user is asking for text or structured data rather than aesthetic or spatial information.

#### Visual Signal Extraction with SigLIP

When `use_siglip=True`, the router forwards the image through a **SigLIP** model to extract visual signals. As implemented in [`trained_router.py`](https://github.com/chopratejas/headroom/blob/main/trained_router.py) (lines 88-106), the router computes an image embedding and compares it against pre-computed text embeddings for the concepts *has_text*, *is_document*, *is_complex*, and *has_small_details*:

```python
image_embedding = self._get_image_embedding(image_data)
image_signals = self._analyze_image(image_embedding)

```

Similarity scores are squashed through a sigmoid to produce calibrated values in the range [0, 1]. These scores indicate whether the image actually contains the visual properties the query classifier assumes.

#### Decision Fusion and Override Logic

The router merges the Mini-LM query prediction with the SigLIP image signals to adjust confidence and apply guardrails. If the query classifier predicts `TRANSCODE` but SigLIP reports no text is present, the router can demote or override the technique. This fusion logic appears in [`trained_router.py`](https://github.com/chopratejas/headroom/blob/main/trained_router.py) around lines 60-78.

The final `RouteDecision` object contains:

- `technique` — the chosen compression method.
- `confidence` — the combined confidence score.
- `reason` — a human-readable explanation.
- `image_signals` — SigLIP diagnostic values.
- `query_prediction` and `query_confidence` — raw Mini-LM outputs.

### Stage 3: Technique Application and Token Accounting

Once the router selects a technique, the compressor applies it and measures the delta. The techniques and their approximate savings are:

| Technique | Action | Approximate Token Savings |
|-----------|--------|---------------------------|
| **TRANSCODE** | Runs OCR (`_ocr_extract`) and replaces the image with extracted text | **~99%** |
| **CROP** | Detects a region of interest and crops before resizing | **50-90%** |
| **FULL_LOW** | Forces `detail="low"` or resizes to a small JPEG | **~87%** |
| **PRESERVE** | Keeps the image unchanged | **0%** |

Token accounting happens in two phases. First, `_estimate_tokens` in [`headroom/image/compressor.py`](https://github.com/chopratejas/headroom/blob/main/headroom/image/compressor.py) (lines 94-104) projects the pre-compression cost using the standard 85-tokens-per-512×512-tile formula. After compression, `_count_result_tokens` (lines 123-136) measures the actual tokens in the resulting message list. The difference is exposed through the `last_savings` property:

```python

# headroom/image/compressor.py (lines 48-53)

@property
def last_savings(self) -> float:
    if self.last_result:
        return self.last_result.savings_percent
    return 0.0

```

Because the router selects the most aggressive technique compatible with both the query intent and image content, the aggregate token reduction can be dramatic—especially when a high-resolution document is replaced by a few lines of OCR text.

## Practical Code Examples

### Basic Image Compression

The following snippet compresses OpenAI-style messages and reports the savings:

```python
from headroom.image import ImageCompressor

msgs = [
    {"role": "user", "content": [
        {"type": "image_url",
         "image_url": {"url": "data:image/png;base64,<...>", "detail": "high"}}
    ]}
]

compressor = ImageCompressor()
compressed = compressor.compress(msgs, provider="openai")
print(f"Saved {compressor.last_savings:.1f}% tokens")

```

### Inspecting the Router Decision

You can probe the trained router directly to see why a particular technique was chosen:

```python
router = compressor._get_router()  # lazy-loads the Mini-LM + SigLIP stack

decision = router.classify(
    image_data=compressor._extract_image_data(msgs),
    query="Describe the diagram"
)

print("Chosen technique:", decision.technique.value)
print("Confidence:", f"{decision.confidence:.2%}")
print("Reason:", decision.reason)

```

### Using the Convenience Helper

For one-off compression without manually managing the compressor lifecycle, use `compress_images`:

```python
from headroom.image import compress_images

compressed = compress_images(msgs, provider="anthropic")

# Creates a temporary ImageCompressor, runs compress, and closes models automatically.

```

## Key Source Files

| File | Purpose |
|------|---------|
| [`headroom/image/compressor.py`](https://github.com/chopratejas/headroom/blob/main/headroom/image/compressor.py) | Main orchestration, token estimation (`_estimate_tokens`), OCR fallback, technique application |
| [`headroom/image/trained_router.py`](https://github.com/chopratejas/headroom/blob/main/headroom/image/trained_router.py) | Trained Mini-LM + SigLIP router that decides the compression technique |
| [`headroom/image/tile_optimizer.py`](https://github.com/chopratejas/headroom/blob/main/headroom/image/tile_optimizer.py) | Zero-loss tile-boundary alignment preprocessor |
| [`headroom/models/ml_models.py`](https://github.com/chopratejas/headroom/blob/main/headroom/models/ml_models.py) | Centralized model registry for loading the Mini-LM and SigLIP checkpoints |
| [`headroom/models/config.py`](https://github.com/chopratejas/headroom/blob/main/headroom/models/config.py) | Default model IDs (`technique_router`, `siglip`) consumed by the router |

## Summary

- **Tile-boundary alignment** in [`headroom/image/tile_optimizer.py`](https://github.com/chopratejas/headroom/blob/main/headroom/image/tile_optimizer.py) removes redundant pixels before any ML work begins.
- The **trained ML router** in [`headroom/image/trained_router.py`](https://github.com/chopratejas/headroom/blob/main/headroom/image/trained_router.py) fuses a fine-tuned **Mini-LM** query classifier with a **SigLIP** visual analyzer to pick the optimal technique.
- Supported techniques are **`TRANSCODE`**, **`CROP`**, **`FULL_LOW`**, and **`PRESERVE`**, with `TRANSCODE` delivering up to **99%** token reduction via OCR.
- Token savings are measured precisely using `_estimate_tokens` and `_count_result_tokens`, then surfaced through the **`last_savings`** property.
- The architecture ensures aggressive compression only occurs when the image content and user query agree it is safe.

## Frequently Asked Questions

### What model does the trained ML router use for query classification?

The router uses a fine-tuned **Mini-LM** transformer loaded through [`headroom/models/ml_models.py`](https://github.com/chopratejas/headroom/blob/main/headroom/models/ml_models.py). It tokenizes the user query with a 64-token maximum length and outputs a probability distribution across the four compression techniques.

### How accurate is the trained router at selecting compression techniques?

According to the source code in [`headroom/image/trained_router.py`](https://github.com/chopratejas/headroom/blob/main/headroom/image/trained_router.py), the fine-tuned Mini-LM achieves approximately **93.7%** accuracy on the training set for predicting the correct technique from the query text alone.

### Can I use ImageCompressor without the SigLIP image analysis?

Yes. The `use_siglip` parameter controls whether the visual signal branch runs. When disabled, the router relies solely on the Mini-LM query classifier to choose the technique, though it loses the guardrails that prevent OCR on images without text.

### How does the tile optimizer reduce tokens without affecting image quality?

[`headroom/image/tile_optimizer.py`](https://github.com/chopratejas/headroom/blob/main/headroom/image/tile_optimizer.py) aligns image dimensions to the provider's token grid boundaries. This removes redundant partial tiles—pixels that the provider would bill as full tiles—but does not alter the visible image content, making it a lossless preprocessing step.