How to Use ImageCompressor for 40-90% Token Reduction on Images

Headroom's ImageCompressor reduces image token costs by 40-90% through a three-stage pipeline that aligns images to provider tile boundaries, routes them through an ML-based technique selector, and applies lossless resizing or OCR transcoding without degrading LLM comprehension.

The ImageCompressor class in the chopratejas/headroom repository provides an automated solution for optimizing vision model payloads. By processing images through mathematical optimization and intelligent routing, it significantly reduces API costs while maintaining the visual fidelity required for accurate LLM responses.

The Three-Stage Compression Pipeline

The ImageCompressor implemented in headroom/image/compressor.py operates through three distinct stages to maximize token efficiency.

Stage 1: Tile-Boundary Optimization

The tile optimizer performs pure-mathematical resizing to align images with provider-specific vision-token tile boundaries. Located in headroom/image/tile_optimizer.py, this stage achieves zero-quality-loss reduction by adjusting dimensions to match tile constraints.

For OpenAI's vision model, this optimization can reduce a 770-pixel image from 4 tiles (approximately 765 tokens) to a single tile (approximately 255 tokens). The optimize_images_in_messages function handles this alignment automatically based on the specified provider.

Stage 2: ML-Based Technique Routing

The compressor employs a lightweight ONNX (with PyTorch fallback) model to evaluate the combination of image content and user query. The _get_router method lazily loads this router, which selects the optimal compression technique from four options:

  • preserve: Maintain the original image without modification
  • full_low: Request provider-side low-detail encoding
  • crop: Aggressively downscale the image via Pillow
  • transcode: Execute OCR and replace the image with extracted text

This routing logic in _apply_compression ensures the cheapest safe technique is selected for each specific image-query pair.

Stage 3: Technique Application

Depending on the router's selection, the compressor executes one of three transformation strategies:

  1. Detail flag modification: Sets OpenAI's "detail" field to "low", reducing costs to approximately 85 tokens per image
  2. Image resizing: Rescales using Pillow and re-encodes as JPEG (default quality 85)
  3. OCR transcoding: Invokes RapidOCR to extract text and replaces the image block with [OCR from image]

Basic Usage for 40-90% Reduction

To implement the compression pipeline in your application:

from headroom.image import ImageCompressor

# Prepare your LLM messages in OpenAI, Anthropic, or Google format

messages = [
    {"role": "user", "content": [
        {"type": "text", "text": "Describe the scene in the picture."},
        {"type": "image_url",
         "image_url": {"url": "data:image/png;base64,<base64-data>"}}
    ]}
]

# Initialize compressor (router loads on-demand)

compressor = ImageCompressor()

# Execute compression with provider specification

compressed = compressor.compress(messages, provider="openai")

# Review savings metrics

print("Original token estimate:", compressor.last_result.original_tokens)
print("Compressed token count:", compressor.last_result.compressed_tokens)
print("Savings (%):", compressor.last_result.savings_percent)

The compress method returns modified messages, while compressor.last_result contains a CompressionResult object detailing the original token count, compressed token count, and calculated savings percentage.

Advanced Configuration Techniques

Bypassing the ML Router

For deterministic behavior or testing specific strategies, disable the router and force a particular technique:

from headroom.image.compressor import ImageCompressor, Technique

compressor = ImageCompressor()
compressor._router = None  # Disable ML routing

# Manually select cropping technique

compressed = compressor._apply_compression(
    messages=messages,
    technique=Technique.CROP,
    provider="anthropic"
)

Available Technique enum values include PRESERVE, FULL_LOW, CROP, and TRANSCODE.

Isolating Tile Optimization

To measure savings from tile-boundary alignment exclusively without ML routing:

from headroom.image.tile_optimizer import optimize_images_in_messages

optimized_msgs, results = optimize_images_in_messages(
    messages, 
    provider="openai"
)

total_saved = sum(r.tokens_saved for r in results)
print(f"Tile optimizer saved {total_saved} tokens "
      f"({results[0].savings_pct:.1f}%)")

How Token Reduction Is Calculated

The ImageCompressor calculates savings through provider-specific estimation functions:

  • Original tokens: Determined by estimate_openai_tokens or estimate_anthropic_tokens including tile-boundary alignment benefits
  • Compressed tokens: Count after applying the selected technique (low-detail flag, resized JPEG, or OCR text replacement)
  • Savings percentage: Derived from (original - compressed) / original as reported in CompressionResult.savings_percent

Typical results range from 40% reduction (conservative crop operations) to 90%+ reduction (successful transcode or full_low operations on large images).

Summary

  • The ImageCompressor in headroom/image/compressor.py implements a three-stage pipeline: tile-boundary optimization, ML-based routing, and technique application.
  • Tile optimization in headroom/image/tile_optimizer.py provides zero-loss savings by aligning images to provider tile boundaries (e.g., 770px to 255 tokens).
  • The ML router selects from four techniques (preserve, full_low, crop, transcode) based on image content and query context via _get_router and _apply_compression.
  • Low-detail encoding (full_low) typically yields ~85 tokens per image, while OCR transcoding (transcode) can exceed 95% token savings for text-heavy images.
  • Access compression metrics through the CompressionResult object stored in compressor.last_result.

Frequently Asked Questions

How does ImageCompressor achieve 40-90% token reduction without losing visual information?

The pipeline first applies tile-boundary optimization, which mathematically aligns image dimensions to provider tile constraints without visual degradation. Subsequent stages use an ML router to select context-appropriate techniques—such as requesting low-detail encoding for complex photos or OCR for text-heavy images—ensuring the LLM receives sufficient information at minimal token cost.

What distinguishes the 'full_low' technique from 'transcode'?

The full_low technique instructs the provider (particularly OpenAI) to transmit the image using low-detail encoding, typically costing approximately 85 tokens regardless of original size. The transcode technique runs RapidOCR locally to extract text content and replaces the entire image block with the extracted text, often achieving greater than 95% token reduction for documents or screenshots containing readable text.

Can I disable the ML router for deterministic compression behavior?

Yes. Set compressor._router = None after initialization to bypass the ONNX/PyTorch routing model. You can then manually invoke _apply_compression with a specific Technique enum value (such as Technique.CROP or Technique.FULL_LOW) to enforce consistent, predictable behavior across all images.

Which LLM providers support the tile optimizer?

The optimize_images_in_messages function and compress method support OpenAI, Anthropic, and Google provider specifications. The tile optimizer automatically adjusts boundary calculations based on the provider parameter, ensuring compatible dimension alignment for each platform's vision token pricing structure.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →