How Headroom's ContentRouter Selects the Optimal Compression Strategy for Different Content Types

Headroom's ContentRouter determines the optimal compression strategy through a three-phase pipeline: detecting mixed-content documents via regex patterns, classifying pure content using a Rust detector with Python fallback, and mapping the resulting ContentType to a specific CompressionStrategy while respecting configuration overrides.

The ContentRouter in the chopratejas/headroom repository acts as an intelligent traffic controller for text compression, automatically routing source code, JSON arrays, search results, and mixed documents to specialized compressors. Understanding how it selects between strategies like CODE_AWARE, SMART_CRUSHER, and KOMPRESS is essential for optimizing token reduction across diverse content types.

The Three-Phase Strategy Selection Pipeline

The strategy selection logic is implemented in headroom/transforms/content_router.py and operates sequentially until a definitive strategy is assigned.

Phase 1: Mixed-Content Detection via Regex Analysis

Before invoking heavy classification logic, the router checks if the input contains heterogeneous content types that would benefit from section-specific compression. The is_mixed_content() function (located at lines 27‑41 in content_router.py) compiles four detection patterns:

  • _CODE_FENCE_PATTERN – identifies fenced code blocks (e.g., ```python)
  • _JSON_BLOCK_START – detects JSON object/array beginnings
  • _SEARCH_RESULT_PATTERN – recognizes structured search result formats
  • _PROSE_PATTERN – flags natural language text

If at least two of these indicators return true, the content is classified as mixed. The router immediately returns CompressionStrategy.MIXED and delegates to _compress_mixed(), which splits the document via split_into_sections() and processes each section with its own optimal strategy. This prevents code-aware compressors from mangling surrounding prose and vice versa.

Phase 2: Content-Type Classification via Rust Detector

For non-mixed (pure) content, _determine_strategy() calls _detect_content() (lines 10‑27 in content_router.py). This function attempts classification through two layers:

  1. Primary Rust Binding: Invokes headroom._core.detect_content_type, a Rust-based detector that returns a lowercase content tag (e.g., "source_code", "json_array"). The Python layer converts this string into the corresponding ContentType enum member.

  2. Regex Fallback: If the Rust layer returns plain_text, the system invokes _regex_detect_content_type() (defined in headroom/transforms/content_detector.py) to perform lightweight pattern matching for edge cases not covered by the native detector.

The resulting ContentType enum value—such as ContentType.SOURCE_CODE, ContentType.JSON_ARRAY, or ContentType.SEARCH_RESULTS—is then passed to the mapping layer.

Phase 3: Strategy Mapping and Configuration Overrides

The _strategy_from_detection() method (lines 27‑36 in content_router.py) implements a static dictionary mapping ContentType to CompressionStrategy:

mapping = {
    ContentType.SOURCE_CODE:    CompressionStrategy.CODE_AWARE,
    ContentType.JSON_ARRAY:     CompressionStrategy.SMART_CRUSHER,
    ContentType.SEARCH_RESULTS: CompressionStrategy.SEARCH,
    ContentType.BUILD_OUTPUT:   CompressionStrategy.LOG,
    ContentType.GIT_DIFF:       CompressionStrategy.DIFF,
    ContentType.HTML:           CompressionStrategy.HTML,
    ContentType.PLAIN_TEXT:     CompressionStrategy.TEXT,
}

If the detected type exists in this mapping, the router returns the associated strategy. If absent, it falls back to self.config.fallback_strategy (defaulting to CompressionStrategy.KOMPRESS). Additionally, configuration flags like prefer_code_aware_for_code can override the mapping, forcing source code to use KOMPRESS instead of the AST-aware compressor when disabled.

How Configuration Influences Strategy Selection

The ContentRouterConfig dataclass provides granular control over the selection pipeline:

  • enable_code_aware: When set to False, disables the CODE_AWARE compressor entirely, forcing the fallback chain CODE_AWARE → KOMPRESS.
  • fallback_strategy: Defines the compressor used when content type detection is ambiguous or when a specific compressor is disabled.
  • prefer_code_aware_for_code: If True (default), source code routes to CODE_AWARE; if False, it uses the global fallback.

These overrides are evaluated in _strategy_from_detection() after the initial mapping lookup, ensuring user preferences take precedence over automatic detection.

Practical Code Examples

Routing JSON Arrays to SmartCrusher

from headroom.transforms import ContentRouter

router = ContentRouter()

json_payload = '[{"id":1,"msg":"hello"},{"id":2,"msg":"world"}]'
result = router.compress(json_payload)

print(result.strategy_used)  # ➜ CompressionStrategy.SMART_CRUSHER

print(result.compressed)     # Minified JSON output

When _detect_content() identifies ContentType.JSON_ARRAY, the router automatically selects SMART_CRUSHER, which applies structural token reduction optimized for JSON.

Handling Mixed Markdown Documents

readme = """

# API Documentation

```python
def authenticate(token):
    return verify(token)

Configure the endpoint using the settings above. """

result = router.compress(readme) print(result.strategy_used) # ➜ CompressionStrategy.MIXED

print(len(result.routing_log)) # Multiple RoutingDecision entries


The `is_mixed_content()` function detects both prose (`# API Documentation`) and code fences, triggering the `MIXED` strategy. The router splits the document and compresses the Python block with `CODE_AWARE` (or `KOMPRESS` if disabled) and the markdown with `TEXT`.

### Disabling Code-Aware Compression via Configuration

```python
from headroom.transforms import ContentRouter, ContentRouterConfig, CompressionStrategy

config = ContentRouterConfig(
    enable_code_aware=False,
    fallback_strategy=CompressionStrategy.KOMPRESS
)
router = ContentRouter(config=config)

code = "def add(a, b):\n    return a + b"
result = router.compress(code)

print(result.strategy_used)  # ➜ CompressionStrategy.KOMPRESS

Even though the Rust detector correctly identifies ContentType.SOURCE_CODE, the configuration override forces the router to use KOMPRESS instead of the AST-aware compressor.

Key Source Files and Architecture

File Responsibility
headroom/transforms/content_router.py Contains ContentRouter class, is_mixed_content(), _detect_content(), _determine_strategy(), and _strategy_from_detection().
headroom/transforms/content_detector.py Defines the ContentType enum and _regex_detect_content_type() fallback logic.
headroom/transforms/base.py Base Transform class inherited by ContentRouter for pipeline integration.
headroom/compression/strategies/ Concrete implementations (e.g., code_aware.py, smart_crusher.py) referenced by the strategy enum.

The _apply_strategy_to_content() method handles lazy loading of compressor instances and builds fallback chains (e.g., attempting CODE_AWARE before falling back to KOMPRESS if the former raises an exception).

Summary

  • Mixed-content detection uses regex heuristics to identify documents containing multiple content types, routing them to the MIXED strategy for section-aware processing.
  • Content classification relies on a Rust-based detector (headroom._core.detect_content_type) with a Python regex fallback to assign ContentType labels.
  • Strategy mapping translates content types to compressors via a static dictionary, with ContentRouterConfig options enabling user overrides for disabled features or preferred fallbacks.
  • Graceful degradation ensures that if a specific compressor is unavailable or fails, the system falls back to KOMPRESS or the user-defined fallback_strategy.

Frequently Asked Questions

What happens if the Rust content detector cannot identify the file type?

If the Rust detect_content_type function returns "plain_text" or an unrecognized tag, the router invokes _regex_detect_content_type() as a secondary check. If this also fails to identify a specific type, the router uses the fallback_strategy configured in ContentRouterConfig (defaulting to CompressionStrategy.KOMPRESS).

How does Headroom handle documents that contain both code and natural language?

Documents triggering multiple detection patterns (e.g., code fences alongside prose paragraphs) are flagged as mixed by is_mixed_content(). The router selects CompressionStrategy.MIXED, splits the document into isolated sections using split_into_sections(), and recursively applies the optimal strategy to each section independently before reassembling the output.

Can I force the ContentRouter to always use a specific compression strategy?

While the router is designed for automatic selection, you can effectively force a specific strategy by setting the fallback_strategy in your configuration and disabling all specialized compressors (e.g., enable_code_aware=False, enable_smart_crusher=False). However, for production use, it is recommended to let the router select strategies while tuning via prefer_code_aware_for_code and similar flags.

What compression strategy does Headroom use for unknown content types?

For content types not present in the static mapping (such as binary data or unrecognized markup), the router defaults to the fallback_strategy specified in the configuration. By default, this is CompressionStrategy.KOMPRESS, a general-purpose compressor designed to handle arbitrary text safely when specialized strategies are unavailable.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →