# How Headroom's ContentRouter Selects the Optimal Compression Strategy for Different Content Types

> Explore how Headroom's ContentRouter selects optimal compression strategies for diverse content types using a three-phase detection, classification, and mapping pipeline.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: how-to-guide
- Published: 2026-06-06

---

**Headroom's ContentRouter determines the optimal compression strategy through a three-phase pipeline: detecting mixed-content documents via regex patterns, classifying pure content using a Rust detector with Python fallback, and mapping the resulting `ContentType` to a specific `CompressionStrategy` while respecting configuration overrides.**

The `ContentRouter` in the [chopratejas/headroom](https://github.com/chopratejas/headroom) repository acts as an intelligent traffic controller for text compression, automatically routing source code, JSON arrays, search results, and mixed documents to specialized compressors. Understanding how it selects between strategies like `CODE_AWARE`, `SMART_CRUSHER`, and `KOMPRESS` is essential for optimizing token reduction across diverse content types.

## The Three-Phase Strategy Selection Pipeline

The strategy selection logic is implemented in [`headroom/transforms/content_router.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/content_router.py) and operates sequentially until a definitive strategy is assigned.

### Phase 1: Mixed-Content Detection via Regex Analysis

Before invoking heavy classification logic, the router checks if the input contains heterogeneous content types that would benefit from section-specific compression. The `is_mixed_content()` function (located at lines 27‑41 in [`content_router.py`](https://github.com/chopratejas/headroom/blob/main/content_router.py)) compiles four detection patterns:

- `_CODE_FENCE_PATTERN` – identifies fenced code blocks (e.g., ```python)
- `_JSON_BLOCK_START` – detects JSON object/array beginnings
- `_SEARCH_RESULT_PATTERN` – recognizes structured search result formats
- `_PROSE_PATTERN` – flags natural language text

If at least two of these indicators return true, the content is classified as mixed. The router immediately returns `CompressionStrategy.MIXED` and delegates to `_compress_mixed()`, which splits the document via `split_into_sections()` and processes each section with its own optimal strategy. This prevents code-aware compressors from mangling surrounding prose and vice versa.

### Phase 2: Content-Type Classification via Rust Detector

For non-mixed (pure) content, `_determine_strategy()` calls `_detect_content()` (lines 10‑27 in `content_router.py`). This function attempts classification through two layers:

1. **Primary Rust Binding**: Invokes `headroom._core.detect_content_type`, a Rust-based detector that returns a lowercase content tag (e.g., `"source_code"`, `"json_array"`). The Python layer converts this string into the corresponding `ContentType` enum member.

2. **Regex Fallback**: If the Rust layer returns `plain_text`, the system invokes `_regex_detect_content_type()` (defined in `headroom/transforms/content_detector.py`) to perform lightweight pattern matching for edge cases not covered by the native detector.

The resulting `ContentType` enum value—such as `ContentType.SOURCE_CODE`, `ContentType.JSON_ARRAY`, or `ContentType.SEARCH_RESULTS`—is then passed to the mapping layer.

### Phase 3: Strategy Mapping and Configuration Overrides

The `_strategy_from_detection()` method (lines 27‑36 in `content_router.py`) implements a static dictionary mapping `ContentType` to `CompressionStrategy`:

```python
mapping = {
    ContentType.SOURCE_CODE:    CompressionStrategy.CODE_AWARE,
    ContentType.JSON_ARRAY:     CompressionStrategy.SMART_CRUSHER,
    ContentType.SEARCH_RESULTS: CompressionStrategy.SEARCH,
    ContentType.BUILD_OUTPUT:   CompressionStrategy.LOG,
    ContentType.GIT_DIFF:       CompressionStrategy.DIFF,
    ContentType.HTML:           CompressionStrategy.HTML,
    ContentType.PLAIN_TEXT:     CompressionStrategy.TEXT,
}

```

If the detected type exists in this mapping, the router returns the associated strategy. If absent, it falls back to `self.config.fallback_strategy` (defaulting to `CompressionStrategy.KOMPRESS`). Additionally, configuration flags like `prefer_code_aware_for_code` can override the mapping, forcing source code to use `KOMPRESS` instead of the AST-aware compressor when disabled.

## How Configuration Influences Strategy Selection

The `ContentRouterConfig` dataclass provides granular control over the selection pipeline:

- **`enable_code_aware`**: When set to `False`, disables the `CODE_AWARE` compressor entirely, forcing the fallback chain `CODE_AWARE → KOMPRESS`.
- **`fallback_strategy`**: Defines the compressor used when content type detection is ambiguous or when a specific compressor is disabled.
- **`prefer_code_aware_for_code`**: If `True` (default), source code routes to `CODE_AWARE`; if `False`, it uses the global fallback.

These overrides are evaluated in `_strategy_from_detection()` after the initial mapping lookup, ensuring user preferences take precedence over automatic detection.

## Practical Code Examples

### Routing JSON Arrays to SmartCrusher

```python
from headroom.transforms import ContentRouter

router = ContentRouter()

json_payload = '[{"id":1,"msg":"hello"},{"id":2,"msg":"world"}]'
result = router.compress(json_payload)

print(result.strategy_used)  # ➜ CompressionStrategy.SMART_CRUSHER

print(result.compressed)     # Minified JSON output

```

When `_detect_content()` identifies `ContentType.JSON_ARRAY`, the router automatically selects `SMART_CRUSHER`, which applies structural token reduction optimized for JSON.

### Handling Mixed Markdown Documents

```python
readme = """

# API Documentation

```python
def authenticate(token):
    return verify(token)

```

Configure the endpoint using the settings above.
"""

result = router.compress(readme)
print(result.strategy_used)  # ➜ CompressionStrategy.MIXED

print(len(result.routing_log))  # Multiple RoutingDecision entries

```

The `is_mixed_content()` function detects both prose (`# API Documentation`) and code fences, triggering the `MIXED` strategy. The router splits the document and compresses the Python block with `CODE_AWARE` (or `KOMPRESS` if disabled) and the markdown with `TEXT`.

### Disabling Code-Aware Compression via Configuration

```python
from headroom.transforms import ContentRouter, ContentRouterConfig, CompressionStrategy

config = ContentRouterConfig(
    enable_code_aware=False,
    fallback_strategy=CompressionStrategy.KOMPRESS
)
router = ContentRouter(config=config)

code = "def add(a, b):\n    return a + b"
result = router.compress(code)

print(result.strategy_used)  # ➜ CompressionStrategy.KOMPRESS

```

Even though the Rust detector correctly identifies `ContentType.SOURCE_CODE`, the configuration override forces the router to use `KOMPRESS` instead of the AST-aware compressor.

## Key Source Files and Architecture

| File | Responsibility |
|------|--------------|
| [`headroom/transforms/content_router.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/content_router.py) | Contains `ContentRouter` class, `is_mixed_content()`, `_detect_content()`, `_determine_strategy()`, and `_strategy_from_detection()`. |
| [`headroom/transforms/content_detector.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/content_detector.py) | Defines the `ContentType` enum and `_regex_detect_content_type()` fallback logic. |
| [`headroom/transforms/base.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/base.py) | Base `Transform` class inherited by `ContentRouter` for pipeline integration. |
| `headroom/compression/strategies/` | Concrete implementations (e.g., [`code_aware.py`](https://github.com/chopratejas/headroom/blob/main/code_aware.py), [`smart_crusher.py`](https://github.com/chopratejas/headroom/blob/main/smart_crusher.py)) referenced by the strategy enum. |

The `_apply_strategy_to_content()` method handles lazy loading of compressor instances and builds fallback chains (e.g., attempting `CODE_AWARE` before falling back to `KOMPRESS` if the former raises an exception).

## Summary

- **Mixed-content detection** uses regex heuristics to identify documents containing multiple content types, routing them to the `MIXED` strategy for section-aware processing.
- **Content classification** relies on a Rust-based detector (`headroom._core.detect_content_type`) with a Python regex fallback to assign `ContentType` labels.
- **Strategy mapping** translates content types to compressors via a static dictionary, with `ContentRouterConfig` options enabling user overrides for disabled features or preferred fallbacks.
- **Graceful degradation** ensures that if a specific compressor is unavailable or fails, the system falls back to `KOMPRESS` or the user-defined `fallback_strategy`.

## Frequently Asked Questions

### What happens if the Rust content detector cannot identify the file type?

If the Rust `detect_content_type` function returns `"plain_text"` or an unrecognized tag, the router invokes `_regex_detect_content_type()` as a secondary check. If this also fails to identify a specific type, the router uses the `fallback_strategy` configured in `ContentRouterConfig` (defaulting to `CompressionStrategy.KOMPRESS`).

### How does Headroom handle documents that contain both code and natural language?

Documents triggering multiple detection patterns (e.g., code fences alongside prose paragraphs) are flagged as mixed by `is_mixed_content()`. The router selects `CompressionStrategy.MIXED`, splits the document into isolated sections using `split_into_sections()`, and recursively applies the optimal strategy to each section independently before reassembling the output.

### Can I force the ContentRouter to always use a specific compression strategy?

While the router is designed for automatic selection, you can effectively force a specific strategy by setting the `fallback_strategy` in your configuration and disabling all specialized compressors (e.g., `enable_code_aware=False`, `enable_smart_crusher=False`). However, for production use, it is recommended to let the router select strategies while tuning via `prefer_code_aware_for_code` and similar flags.

### What compression strategy does Headroom use for unknown content types?

For content types not present in the static mapping (such as binary data or unrecognized markup), the router defaults to the `fallback_strategy` specified in the configuration. By default, this is `CompressionStrategy.KOMPRESS`, a general-purpose compressor designed to handle arbitrary text safely when specialized strategies are unavailable.