How to Use ImageCompressor for 40-90% Token Reduction on Images
Headroom's ImageCompressor reduces image token costs by 40-90% through a three-stage pipeline that aligns images to provider tile boundaries, routes them through an ML-based technique selector, and applies lossless resizing or OCR transcoding without degrading LLM comprehension.
The ImageCompressor class in the chopratejas/headroom repository provides an automated solution for optimizing vision model payloads. By processing images through mathematical optimization and intelligent routing, it significantly reduces API costs while maintaining the visual fidelity required for accurate LLM responses.
The Three-Stage Compression Pipeline
The ImageCompressor implemented in headroom/image/compressor.py operates through three distinct stages to maximize token efficiency.
Stage 1: Tile-Boundary Optimization
The tile optimizer performs pure-mathematical resizing to align images with provider-specific vision-token tile boundaries. Located in headroom/image/tile_optimizer.py, this stage achieves zero-quality-loss reduction by adjusting dimensions to match tile constraints.
For OpenAI's vision model, this optimization can reduce a 770-pixel image from 4 tiles (approximately 765 tokens) to a single tile (approximately 255 tokens). The optimize_images_in_messages function handles this alignment automatically based on the specified provider.
Stage 2: ML-Based Technique Routing
The compressor employs a lightweight ONNX (with PyTorch fallback) model to evaluate the combination of image content and user query. The _get_router method lazily loads this router, which selects the optimal compression technique from four options:
- preserve: Maintain the original image without modification
- full_low: Request provider-side low-detail encoding
- crop: Aggressively downscale the image via Pillow
- transcode: Execute OCR and replace the image with extracted text
This routing logic in _apply_compression ensures the cheapest safe technique is selected for each specific image-query pair.
Stage 3: Technique Application
Depending on the router's selection, the compressor executes one of three transformation strategies:
- Detail flag modification: Sets OpenAI's
"detail"field to"low", reducing costs to approximately 85 tokens per image - Image resizing: Rescales using Pillow and re-encodes as JPEG (default quality 85)
- OCR transcoding: Invokes RapidOCR to extract text and replaces the image block with
[OCR from image]
Basic Usage for 40-90% Reduction
To implement the compression pipeline in your application:
from headroom.image import ImageCompressor
# Prepare your LLM messages in OpenAI, Anthropic, or Google format
messages = [
{"role": "user", "content": [
{"type": "text", "text": "Describe the scene in the picture."},
{"type": "image_url",
"image_url": {"url": "data:image/png;base64,<base64-data>"}}
]}
]
# Initialize compressor (router loads on-demand)
compressor = ImageCompressor()
# Execute compression with provider specification
compressed = compressor.compress(messages, provider="openai")
# Review savings metrics
print("Original token estimate:", compressor.last_result.original_tokens)
print("Compressed token count:", compressor.last_result.compressed_tokens)
print("Savings (%):", compressor.last_result.savings_percent)
The compress method returns modified messages, while compressor.last_result contains a CompressionResult object detailing the original token count, compressed token count, and calculated savings percentage.
Advanced Configuration Techniques
Bypassing the ML Router
For deterministic behavior or testing specific strategies, disable the router and force a particular technique:
from headroom.image.compressor import ImageCompressor, Technique
compressor = ImageCompressor()
compressor._router = None # Disable ML routing
# Manually select cropping technique
compressed = compressor._apply_compression(
messages=messages,
technique=Technique.CROP,
provider="anthropic"
)
Available Technique enum values include PRESERVE, FULL_LOW, CROP, and TRANSCODE.
Isolating Tile Optimization
To measure savings from tile-boundary alignment exclusively without ML routing:
from headroom.image.tile_optimizer import optimize_images_in_messages
optimized_msgs, results = optimize_images_in_messages(
messages,
provider="openai"
)
total_saved = sum(r.tokens_saved for r in results)
print(f"Tile optimizer saved {total_saved} tokens "
f"({results[0].savings_pct:.1f}%)")
How Token Reduction Is Calculated
The ImageCompressor calculates savings through provider-specific estimation functions:
- Original tokens: Determined by
estimate_openai_tokensorestimate_anthropic_tokensincluding tile-boundary alignment benefits - Compressed tokens: Count after applying the selected technique (low-detail flag, resized JPEG, or OCR text replacement)
- Savings percentage: Derived from
(original - compressed) / originalas reported inCompressionResult.savings_percent
Typical results range from 40% reduction (conservative crop operations) to 90%+ reduction (successful transcode or full_low operations on large images).
Summary
- The ImageCompressor in
headroom/image/compressor.pyimplements a three-stage pipeline: tile-boundary optimization, ML-based routing, and technique application. - Tile optimization in
headroom/image/tile_optimizer.pyprovides zero-loss savings by aligning images to provider tile boundaries (e.g., 770px to 255 tokens). - The ML router selects from four techniques (
preserve,full_low,crop,transcode) based on image content and query context via_get_routerand_apply_compression. - Low-detail encoding (
full_low) typically yields ~85 tokens per image, while OCR transcoding (transcode) can exceed 95% token savings for text-heavy images. - Access compression metrics through the
CompressionResultobject stored incompressor.last_result.
Frequently Asked Questions
How does ImageCompressor achieve 40-90% token reduction without losing visual information?
The pipeline first applies tile-boundary optimization, which mathematically aligns image dimensions to provider tile constraints without visual degradation. Subsequent stages use an ML router to select context-appropriate techniques—such as requesting low-detail encoding for complex photos or OCR for text-heavy images—ensuring the LLM receives sufficient information at minimal token cost.
What distinguishes the 'full_low' technique from 'transcode'?
The full_low technique instructs the provider (particularly OpenAI) to transmit the image using low-detail encoding, typically costing approximately 85 tokens regardless of original size. The transcode technique runs RapidOCR locally to extract text content and replaces the entire image block with the extracted text, often achieving greater than 95% token reduction for documents or screenshots containing readable text.
Can I disable the ML router for deterministic compression behavior?
Yes. Set compressor._router = None after initialization to bypass the ONNX/PyTorch routing model. You can then manually invoke _apply_compression with a specific Technique enum value (such as Technique.CROP or Technique.FULL_LOW) to enforce consistent, predictable behavior across all images.
Which LLM providers support the tile optimizer?
The optimize_images_in_messages function and compress method support OpenAI, Anthropic, and Google provider specifications. The tile optimizer automatically adjusts boundary calculations based on the provider parameter, ensuring compatible dimension alignment for each platform's vision token pricing structure.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →