ContentRouter Compression Strategies in Headroom: When to Use Each Compressor

TLDR: The ContentRouter in chopratejas/headroom supports ten CompressionStrategy values—CODE_AWARE, SMART_CRUSHER, SEARCH, LOG, DIFF, HTML, KOMPRESS, TEXT, MIXED, and PASSTHROUGH—that route payloads to specialized compressors based on mixed-content heuristics, content-type detection, and ContentRouterConfig toggles.

The ContentRouter is the central dispatch component in the chopratejas/headroom project that decides which compression strategy to apply to a given payload. It inspects incoming text, classifies its structure, and delegates to the optimal compressor while respecting user-defined configuration flags. Understanding these ContentRouter compression strategies lets you predict which transformer will run and when to override the defaults.

How ContentRouter Chooses a Compression Strategy

The decision logic in headroom/transforms/content_router.py rests on three pillars: mixed-content detection, content-type classification, and configuration overrides.

Mixed-Content Detection

When a payload contains multiple distinct sections—such as code fences, JSON blocks, search results, and prose—the router selects the MIXED strategy. The helper is_mixed_content() (lines 124–142) counts content indicators; if it reports two or more, _determine_strategy() (lines 1010–1014) routes the input through split_into_sections() (lines 124–164), compresses each part with its best-fit strategy, and re-assembles the result.

Content-Type Detection

For uniform payloads, _detect_content() (lines 110–123) uses a Rust-backed detector to classify the whole buffer into a ContentType enum. The router then maps that type to a compressor inside _strategy_from_detection() (lines 1034–1036). Supported classifications include source code, JSON arrays, search results, build output, git diffs, HTML, and plain text.

Configuration Flags

Individual strategies can be toggled through ContentRouterConfig (lines 380–410). For example, self.config.enable_code_aware at line 1004 inside _apply_strategy_to_content() determines whether source code receives CODE_AWARE processing or falls back to KOMPRESS. Similarly, flags like enable_smart_crusher, enable_search_compressor, enable_log_compressor, and enable_html_extractor gate their respective compressors, while prefer_code_aware_for_code can override the code-aware default.

Complete List of ContentRouter Compression Strategies

Each enum value maps to a concrete compressor and specific selection criteria.

  • CODE_AWARE – Routes to CodeAwareCompressor for AST-preserving source code compression. The router chooses this when ContentType.SOURCE_CODE is detected and config.enable_code_aware is True. If disabled, it falls back to KOMPRESS per the override clause in _strategy_from_detection().

  • SMART_CRUSHER – Routes to SmartCrusher for high-throughput JSON-array compression. Selected when the detector returns ContentType.JSON_ARRAY and config.enable_smart_crusher is enabled. If token count does not improve, the router attempts a fallback to KOMPRESS.

  • SEARCH – Routes to SearchCompressor for grep or ripgrep result compression. Triggered by ContentType.SEARCH_RESULTS when config.enable_search_compressor is True.

  • LOG – Routes to LogCompressor for build and test output. Applied to ContentType.BUILD_OUTPUT when config.enable_log_compressor is active.

  • DIFF – Routes to DiffCompressor for git diff compression. Matched on ContentType.GIT_DIFF with no explicit flag; this strategy is always available.

  • HTML – Routes to HtmlExtractor to extract readable text from HTML. Selected on ContentType.HTML when config.enable_html_extractor is True.

  • KOMPRESS – Routes to KompressCompressor, the ML-based token compressor. This is the default for ContentType.PLAIN_TEXT and serves as the universal fallback when an explicit strategy is disabled or unavailable.

  • TEXT – An alias that ultimately invokes the same KompressCompressor as KOMPRESS. Used when the router explicitly labels plain text as TEXT rather than the generic KOMPRESS fallback.

  • MIXED – Executes internal split-route logic rather than a single compressor. Triggered when is_mixed_content() reports at least two content indicators, dispatching any of the above strategies on a per-section basis.

  • PASSTHROUGH – Returns the input unchanged with no compression. Used when the buffer is empty or whitespace-only, or when a strategy is disabled and no fallback is applicable.

Simplified Routing Decision Flow

The core dispatch logic lives in _determine_strategy() and _strategy_from_detection(). The flow follows this pattern:

if is_mixed_content → MIXED
else
    detection = _detect_content(content)
    strategy = mapping[detection.content_type]   # see enum → compressor map

    if strategy == CODE_AWARE and not config.prefer_code_aware_for_code:
        strategy = KOMPRESS   # override

This logic is implemented in headroom/transforms/content_router.py at lines 1010–1014 and 1034–1036. The router also respects fallback_strategy (default KOMPRESS) when no explicit strategy matches.

Code Examples for ContentRouter Compression Strategies

All examples below execute through ContentRouter.compress in headroom/transforms/content_router.py (lines 808–980).

Route Source Code to CODE_AWARE

from headroom.transforms import ContentRouter, CompressionStrategy
from headroom.transforms.content_router import ContentRouterConfig

router = ContentRouter()
python_code = "def hello():\n    print('world')\n"
result = router.compress(python_code)
print(result.strategy_used)          # → CompressionStrategy.CODE_AWARE

print(result.compressed)             # AST‑preserving compressed code

Route JSON Arrays to SMART_CRUSHER

json_array = "[\n" + ",\n".join([str(i) for i in range(1000)]) + "\n]"
result = router.compress(json_array)
print(result.strategy_used)          # → CompressionStrategy.SMART_CRUSHER

Route Mixed Documents to MIXED

mixed_doc = ("# Project README\n\n"

             "```python\ndef foo(): pass\n```\n\n"
             "Here is some description.\n\n"
             "```json\n[1,2,3]\n```")
result = router.compress(mixed_doc)
print(result.strategy_used)          # → CompressionStrategy.MIXED

print(result.routing_log)            # shows CODE_AWARE, TEXT, SMART_CRUSHER per section

Override Strategy via ContentRouterConfig

cfg = ContentRouterConfig(enable_code_aware=False, enable_kompress=True)
router = ContentRouter(config=cfg)
result = router.compress(python_code)
print(result.strategy_used)          # → CompressionStrategy.KOMPRESS

Key Implementation Files

The ContentRouter compression strategies are defined across the following modules:

Individual compressors are referenced via lazy loaders inside ContentRouter._apply_strategy_to_content.

Summary

  • The ContentRouter supports ten distinct CompressionStrategy values, each mapped to a specialized compressor.
  • Routing begins with is_mixed_content() in headroom/transforms/content_router.py; if multiple indicators exist, the payload is split and processed under the MIXED strategy.
  • Uniform payloads are classified by _detect_content() and mapped through _strategy_from_detection(), with ContentRouterConfig flags gate-keeping optional compressors.
  • KOMPRESS serves as the default fallback for plain text and for any disabled strategy, while PASSTHROUGH handles empty or unsupported inputs.

Frequently Asked Questions

What is the default ContentRouter compression strategy when no content type matches?

When the router cannot match a specific content type, it falls back to KOMPRESS via the fallback_strategy default. This routes the payload to the KompressCompressor, an ML-based token compressor that handles generic plain text.

How do I disable the AST-preserving code compressor and force ML-based compression?

Set enable_code_aware=False inside ContentRouterConfig when instantiating ContentRouter. According to the override clause in _strategy_from_detection() at lines 1034–1036, source code will then route to KOMPRESS instead of CODE_AWARE.

What happens when a document contains both code fences and JSON blocks?

If is_mixed_content() detects two or more distinct content indicators, the router selects the MIXED strategy. It calls split_into_sections() in headroom/transforms/content_router.py and compresses each section with its own optimal strategy before re-assembling the final output.

Is there a way to bypass all compression and return the original text?

Yes. The PASSTHROUGH strategy returns the input unchanged. The router automatically selects it for empty or whitespace-only buffers, or when a targeted strategy is disabled and no fallback is applicable.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →