ContentRouter Compression Strategies in Headroom: When to Use Each Compressor
TLDR: The ContentRouter in chopratejas/headroom supports ten CompressionStrategy values—CODE_AWARE, SMART_CRUSHER, SEARCH, LOG, DIFF, HTML, KOMPRESS, TEXT, MIXED, and PASSTHROUGH—that route payloads to specialized compressors based on mixed-content heuristics, content-type detection, and ContentRouterConfig toggles.
The ContentRouter is the central dispatch component in the chopratejas/headroom project that decides which compression strategy to apply to a given payload. It inspects incoming text, classifies its structure, and delegates to the optimal compressor while respecting user-defined configuration flags. Understanding these ContentRouter compression strategies lets you predict which transformer will run and when to override the defaults.
How ContentRouter Chooses a Compression Strategy
The decision logic in headroom/transforms/content_router.py rests on three pillars: mixed-content detection, content-type classification, and configuration overrides.
Mixed-Content Detection
When a payload contains multiple distinct sections—such as code fences, JSON blocks, search results, and prose—the router selects the MIXED strategy. The helper is_mixed_content() (lines 124–142) counts content indicators; if it reports two or more, _determine_strategy() (lines 1010–1014) routes the input through split_into_sections() (lines 124–164), compresses each part with its best-fit strategy, and re-assembles the result.
Content-Type Detection
For uniform payloads, _detect_content() (lines 110–123) uses a Rust-backed detector to classify the whole buffer into a ContentType enum. The router then maps that type to a compressor inside _strategy_from_detection() (lines 1034–1036). Supported classifications include source code, JSON arrays, search results, build output, git diffs, HTML, and plain text.
Configuration Flags
Individual strategies can be toggled through ContentRouterConfig (lines 380–410). For example, self.config.enable_code_aware at line 1004 inside _apply_strategy_to_content() determines whether source code receives CODE_AWARE processing or falls back to KOMPRESS. Similarly, flags like enable_smart_crusher, enable_search_compressor, enable_log_compressor, and enable_html_extractor gate their respective compressors, while prefer_code_aware_for_code can override the code-aware default.
Complete List of ContentRouter Compression Strategies
Each enum value maps to a concrete compressor and specific selection criteria.
-
CODE_AWARE– Routes toCodeAwareCompressorfor AST-preserving source code compression. The router chooses this whenContentType.SOURCE_CODEis detected andconfig.enable_code_awareisTrue. If disabled, it falls back toKOMPRESSper the override clause in_strategy_from_detection(). -
SMART_CRUSHER– Routes toSmartCrusherfor high-throughput JSON-array compression. Selected when the detector returnsContentType.JSON_ARRAYandconfig.enable_smart_crusheris enabled. If token count does not improve, the router attempts a fallback toKOMPRESS. -
SEARCH– Routes toSearchCompressorfor grep or ripgrep result compression. Triggered byContentType.SEARCH_RESULTSwhenconfig.enable_search_compressorisTrue. -
LOG– Routes toLogCompressorfor build and test output. Applied toContentType.BUILD_OUTPUTwhenconfig.enable_log_compressoris active. -
DIFF– Routes toDiffCompressorfor git diff compression. Matched onContentType.GIT_DIFFwith no explicit flag; this strategy is always available. -
HTML– Routes toHtmlExtractorto extract readable text from HTML. Selected onContentType.HTMLwhenconfig.enable_html_extractorisTrue. -
KOMPRESS– Routes toKompressCompressor, the ML-based token compressor. This is the default forContentType.PLAIN_TEXTand serves as the universal fallback when an explicit strategy is disabled or unavailable. -
TEXT– An alias that ultimately invokes the sameKompressCompressorasKOMPRESS. Used when the router explicitly labels plain text asTEXTrather than the genericKOMPRESSfallback. -
MIXED– Executes internal split-route logic rather than a single compressor. Triggered whenis_mixed_content()reports at least two content indicators, dispatching any of the above strategies on a per-section basis. -
PASSTHROUGH– Returns the input unchanged with no compression. Used when the buffer is empty or whitespace-only, or when a strategy is disabled and no fallback is applicable.
Simplified Routing Decision Flow
The core dispatch logic lives in _determine_strategy() and _strategy_from_detection(). The flow follows this pattern:
if is_mixed_content → MIXED
else
detection = _detect_content(content)
strategy = mapping[detection.content_type] # see enum → compressor map
if strategy == CODE_AWARE and not config.prefer_code_aware_for_code:
strategy = KOMPRESS # override
This logic is implemented in headroom/transforms/content_router.py at lines 1010–1014 and 1034–1036. The router also respects fallback_strategy (default KOMPRESS) when no explicit strategy matches.
Code Examples for ContentRouter Compression Strategies
All examples below execute through ContentRouter.compress in headroom/transforms/content_router.py (lines 808–980).
Route Source Code to CODE_AWARE
from headroom.transforms import ContentRouter, CompressionStrategy
from headroom.transforms.content_router import ContentRouterConfig
router = ContentRouter()
python_code = "def hello():\n print('world')\n"
result = router.compress(python_code)
print(result.strategy_used) # → CompressionStrategy.CODE_AWARE
print(result.compressed) # AST‑preserving compressed code
Route JSON Arrays to SMART_CRUSHER
json_array = "[\n" + ",\n".join([str(i) for i in range(1000)]) + "\n]"
result = router.compress(json_array)
print(result.strategy_used) # → CompressionStrategy.SMART_CRUSHER
Route Mixed Documents to MIXED
mixed_doc = ("# Project README\n\n"
"```python\ndef foo(): pass\n```\n\n"
"Here is some description.\n\n"
"```json\n[1,2,3]\n```")
result = router.compress(mixed_doc)
print(result.strategy_used) # → CompressionStrategy.MIXED
print(result.routing_log) # shows CODE_AWARE, TEXT, SMART_CRUSHER per section
Override Strategy via ContentRouterConfig
cfg = ContentRouterConfig(enable_code_aware=False, enable_kompress=True)
router = ContentRouter(config=cfg)
result = router.compress(python_code)
print(result.strategy_used) # → CompressionStrategy.KOMPRESS
Key Implementation Files
The ContentRouter compression strategies are defined across the following modules:
headroom/transforms/content_router.py– Core router implementation,CompressionStrategyenum, mixed-content detection, and routing logic.headroom/transforms/content_detector.py–ContentTypeenum and the Rust-backeddetect_content_typewrapper consumed by the router.headroom/config.py– Global configuration and defaults for enabling or disabling each compressor.headroom/transforms/base.py– BaseTransformclass thatContentRouterinherits from.
Individual compressors are referenced via lazy loaders inside ContentRouter._apply_strategy_to_content.
Summary
- The
ContentRoutersupports ten distinctCompressionStrategyvalues, each mapped to a specialized compressor. - Routing begins with
is_mixed_content()inheadroom/transforms/content_router.py; if multiple indicators exist, the payload is split and processed under theMIXEDstrategy. - Uniform payloads are classified by
_detect_content()and mapped through_strategy_from_detection(), withContentRouterConfigflags gate-keeping optional compressors. KOMPRESSserves as the default fallback for plain text and for any disabled strategy, whilePASSTHROUGHhandles empty or unsupported inputs.
Frequently Asked Questions
What is the default ContentRouter compression strategy when no content type matches?
When the router cannot match a specific content type, it falls back to KOMPRESS via the fallback_strategy default. This routes the payload to the KompressCompressor, an ML-based token compressor that handles generic plain text.
How do I disable the AST-preserving code compressor and force ML-based compression?
Set enable_code_aware=False inside ContentRouterConfig when instantiating ContentRouter. According to the override clause in _strategy_from_detection() at lines 1034–1036, source code will then route to KOMPRESS instead of CODE_AWARE.
What happens when a document contains both code fences and JSON blocks?
If is_mixed_content() detects two or more distinct content indicators, the router selects the MIXED strategy. It calls split_into_sections() in headroom/transforms/content_router.py and compresses each section with its own optimal strategy before re-assembling the final output.
Is there a way to bypass all compression and return the original text?
Yes. The PASSTHROUGH strategy returns the input unchanged. The router automatically selects it for empty or whitespace-only buffers, or when a targeted strategy is disabled and no fallback is applicable.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →