# How to Use ImageCompressor for 40-90% Token Reduction on Images

> Cut image token costs by 40-90% with Headroom's ImageCompressor. Discover its three-stage pipeline for efficient image processing without sacrificing LLM comprehension.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: how-to-guide
- Published: 2026-06-05

---

**Headroom's ImageCompressor reduces image token costs by 40-90% through a three-stage pipeline that aligns images to provider tile boundaries, routes them through an ML-based technique selector, and applies lossless resizing or OCR transcoding without degrading LLM comprehension.**

The **ImageCompressor** class in the `chopratejas/headroom` repository provides an automated solution for optimizing vision model payloads. By processing images through mathematical optimization and intelligent routing, it significantly reduces API costs while maintaining the visual fidelity required for accurate LLM responses.

## The Three-Stage Compression Pipeline

The `ImageCompressor` implemented in [`headroom/image/compressor.py`](https://github.com/chopratejas/headroom/blob/main/headroom/image/compressor.py) operates through three distinct stages to maximize token efficiency.

### Stage 1: Tile-Boundary Optimization

The **tile optimizer** performs pure-mathematical resizing to align images with provider-specific vision-token tile boundaries. Located in [`headroom/image/tile_optimizer.py`](https://github.com/chopratejas/headroom/blob/main/headroom/image/tile_optimizer.py), this stage achieves zero-quality-loss reduction by adjusting dimensions to match tile constraints.

For OpenAI's vision model, this optimization can reduce a 770-pixel image from 4 tiles (approximately 765 tokens) to a single tile (approximately 255 tokens). The `optimize_images_in_messages` function handles this alignment automatically based on the specified provider.

### Stage 2: ML-Based Technique Routing

The compressor employs a lightweight ONNX (with PyTorch fallback) model to evaluate the combination of image content and user query. The `_get_router` method lazily loads this router, which selects the optimal compression technique from four options:

- **preserve**: Maintain the original image without modification
- **full_low**: Request provider-side low-detail encoding
- **crop**: Aggressively downscale the image via Pillow
- **transcode**: Execute OCR and replace the image with extracted text

This routing logic in `_apply_compression` ensures the cheapest safe technique is selected for each specific image-query pair.

### Stage 3: Technique Application

Depending on the router's selection, the compressor executes one of three transformation strategies:

1. **Detail flag modification**: Sets OpenAI's `"detail"` field to `"low"`, reducing costs to approximately 85 tokens per image
2. **Image resizing**: Rescales using Pillow and re-encodes as JPEG (default quality 85)
3. **OCR transcoding**: Invokes RapidOCR to extract text and replaces the image block with `[OCR from image]`

## Basic Usage for 40-90% Reduction

To implement the compression pipeline in your application:

```python
from headroom.image import ImageCompressor

# Prepare your LLM messages in OpenAI, Anthropic, or Google format

messages = [
    {"role": "user", "content": [
        {"type": "text", "text": "Describe the scene in the picture."},
        {"type": "image_url",
         "image_url": {"url": "data:image/png;base64,<base64-data>"}}
    ]}
]

# Initialize compressor (router loads on-demand)

compressor = ImageCompressor()

# Execute compression with provider specification

compressed = compressor.compress(messages, provider="openai")

# Review savings metrics

print("Original token estimate:", compressor.last_result.original_tokens)
print("Compressed token count:", compressor.last_result.compressed_tokens)
print("Savings (%):", compressor.last_result.savings_percent)

```

The `compress` method returns modified messages, while `compressor.last_result` contains a `CompressionResult` object detailing the original token count, compressed token count, and calculated savings percentage.

## Advanced Configuration Techniques

### Bypassing the ML Router

For deterministic behavior or testing specific strategies, disable the router and force a particular technique:

```python
from headroom.image.compressor import ImageCompressor, Technique

compressor = ImageCompressor()
compressor._router = None  # Disable ML routing

# Manually select cropping technique

compressed = compressor._apply_compression(
    messages=messages,
    technique=Technique.CROP,
    provider="anthropic"
)

```

Available `Technique` enum values include `PRESERVE`, `FULL_LOW`, `CROP`, and `TRANSCODE`.

### Isolating Tile Optimization

To measure savings from tile-boundary alignment exclusively without ML routing:

```python
from headroom.image.tile_optimizer import optimize_images_in_messages

optimized_msgs, results = optimize_images_in_messages(
    messages, 
    provider="openai"
)

total_saved = sum(r.tokens_saved for r in results)
print(f"Tile optimizer saved {total_saved} tokens "
      f"({results[0].savings_pct:.1f}%)")

```

## How Token Reduction Is Calculated

The `ImageCompressor` calculates savings through provider-specific estimation functions:

- **Original tokens**: Determined by `estimate_openai_tokens` or `estimate_anthropic_tokens` including tile-boundary alignment benefits
- **Compressed tokens**: Count after applying the selected technique (low-detail flag, resized JPEG, or OCR text replacement)
- **Savings percentage**: Derived from `(original - compressed) / original` as reported in `CompressionResult.savings_percent`

Typical results range from **40% reduction** (conservative crop operations) to **90%+ reduction** (successful `transcode` or `full_low` operations on large images).

## Summary

- The **ImageCompressor** in [`headroom/image/compressor.py`](https://github.com/chopratejas/headroom/blob/main/headroom/image/compressor.py) implements a three-stage pipeline: tile-boundary optimization, ML-based routing, and technique application.
- **Tile optimization** in [`headroom/image/tile_optimizer.py`](https://github.com/chopratejas/headroom/blob/main/headroom/image/tile_optimizer.py) provides zero-loss savings by aligning images to provider tile boundaries (e.g., 770px to 255 tokens).
- The **ML router** selects from four techniques (`preserve`, `full_low`, `crop`, `transcode`) based on image content and query context via `_get_router` and `_apply_compression`.
- **Low-detail encoding** (`full_low`) typically yields ~85 tokens per image, while **OCR transcoding** (`transcode`) can exceed 95% token savings for text-heavy images.
- Access compression metrics through the `CompressionResult` object stored in `compressor.last_result`.

## Frequently Asked Questions

### How does ImageCompressor achieve 40-90% token reduction without losing visual information?

The pipeline first applies **tile-boundary optimization**, which mathematically aligns image dimensions to provider tile constraints without visual degradation. Subsequent stages use an ML router to select context-appropriate techniques—such as requesting low-detail encoding for complex photos or OCR for text-heavy images—ensuring the LLM receives sufficient information at minimal token cost.

### What distinguishes the 'full_low' technique from 'transcode'?

The **full_low** technique instructs the provider (particularly OpenAI) to transmit the image using low-detail encoding, typically costing approximately 85 tokens regardless of original size. The **transcode** technique runs RapidOCR locally to extract text content and replaces the entire image block with the extracted text, often achieving greater than 95% token reduction for documents or screenshots containing readable text.

### Can I disable the ML router for deterministic compression behavior?

Yes. Set `compressor._router = None` after initialization to bypass the ONNX/PyTorch routing model. You can then manually invoke `_apply_compression` with a specific `Technique` enum value (such as `Technique.CROP` or `Technique.FULL_LOW`) to enforce consistent, predictable behavior across all images.

### Which LLM providers support the tile optimizer?

The `optimize_images_in_messages` function and `compress` method support OpenAI, Anthropic, and Google provider specifications. The tile optimizer automatically adjusts boundary calculations based on the provider parameter, ensuring compatible dimension alignment for each platform's vision token pricing structure.