how-to-guide

How to Configure Headroom for Specific Compression Ratios Using target_ratio

June 5, 2026 chopratejas/headroom ↗

Pass a target_ratio value between 0 and 1 to Headroom's compress() function to retain a specific fraction of tokens, where 0.3 keeps 30% of the original words and achieves approximately 70% compression.

The open-source Headroom library (chopratejas/headroom) provides deterministic token reduction for LLM contexts through its configurable compression pipeline. By specifying a target_ratio parameter, you can control exactly what proportion of your original text survives the compression process, making it ideal for managing context window limits in conversational AI applications.

Understanding the target_ratio Parameter

The target_ratio parameter accepts a float between 0 and 1 that represents the fraction of tokens to preserve. For example, setting target_ratio=0.25 instructs the compressor to retain the top 25% of tokens and remove the remaining 75%. When you omit this parameter, the underlying model falls back to its internal score_threshold to determine retention dynamically.

How target_ratio Propagates Through the Compression Pipeline

Headroom implements a three-stage architecture to process the compression ratio request from the API surface down to the token selection algorithm.

Entry Point in headroom/compress.py

The public compress() function in headroom/compress.py accepts target_ratio as a direct argument and forwards it into the transform pipeline. This serves as the primary user-facing interface for ratio-based compression.

Routing via ContentRouter

The ContentRouter transform in headroom/transforms/content_router.py extracts the runtime target_ratio kwarg from the request context on line 1475 and injects it into each downstream compressor. This routing layer ensures the ratio propagates correctly regardless of which specific compression backend processes the text.

Token Selection in KompressCompressor

The actual compression logic resides in headroom/transforms/kompress_compressor.py, where the KompressCompressor class implements deterministic top-k selection. On line 695, the code calculates num_keep = int(num_tokens × target_ratio) and retains exactly that many highest-scoring tokens. When target_ratio is provided, this overrides any model-specific thresholding behavior.

Code Examples for Configuring Compression Ratios

Basic Usage with the compress() Helper

from headroom.compress import compress

messages = [
    {"role": "assistant", "content": "A very long answer …"},
    {"role": "tool", "content": "Results from a heavy computation …"},
]

# Keep only 25% of the original tokens

compressed = compress(messages, model="gpt-4o", target_ratio=0.25)

print(compressed)   # → list of messages with shortened content

Manual Pipeline Configuration

from headroom.transforms.kompress_compressor import KompressCompressor
from headroom.transforms.content_router import ContentRouter

# Build a pipeline that includes the ContentRouter and Kompress

router = ContentRouter()
kompress = KompressCompressor()

# Supply the ratio through the router's kwargs

router_kwargs = {"target_ratio": 0.4}   # keep 40% of tokens

# The router forwards the kwarg to KompressCompressor internally

router.apply(messages, transformer=kompress, **router_kwargs)

Batch Processing with Variable Ratios

from headroom.transforms.kompress_compressor import KompressCompressor

compressor = KompressCompressor()
texts = [
    "First long paragraph …",
    "Second long paragraph …",
    "Third short one.",
]

# Provide a list – each entry corresponds to the respective text

ratios = [0.3, 0.5, None]   # third text uses the model's default decision

results = compressor.compress_batch(texts, target_ratio=ratios)

for r in results:
    print(r.compression_ratio, r.compressed)

Summary

Valid Range: target_ratio accepts float values from 0 to 1, interpreted as the fraction of tokens to retain.
Pipeline Flow: The parameter travels from headroom/compress.py → ContentRouter (line 1475) → KompressCompressor (line 695).
Deterministic Output: When specified, target_ratio triggers a top-k selection that keeps exactly int(num_tokens × target_ratio) tokens.
Optional Override: Omitting target_ratio allows the model to use its internal score_threshold instead.
Batch Support: Pass a list of ratios to compress_batch() for per-text granularity in batch operations.

Frequently Asked Questions

What happens if I set target_ratio to 0 or 1?

Setting target_ratio=0 removes all tokens, resulting in empty content, while target_ratio=1 preserves the entire text with no compression. Values outside the 0–1 range may raise validation errors depending on the specific version of the chopratejas/headroom repository.

Does target_ratio work with all Headroom transformers?

While target_ratio is primarily implemented in KompressCompressor, the ContentRouter in headroom/transforms/content_router.py routes this parameter to any downstream transform that respects the convention. Other compressors like SmartCrusher in headroom/transforms/smart_crusher.py also honor the target_ratio kwarg when provided.

Can I use different compression ratios for different message types?

Yes. Since target_ratio is passed as a runtime kwarg through the ContentRouter, you can invoke the compress() function multiple times with different ratios for assistant messages versus tool outputs, or use the batch API with a list of ratios to specify per-text retention rates.

How does target_ratio interact with the model's internal scoring?

When you provide target_ratio, it completely overrides the model's internal score_threshold logic. The KompressCompressor calculates the exact number of tokens to keep based on your ratio and performs a deterministic top-k selection by token scores, ignoring any default thresholding behavior.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how chopratejas/headroom works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →