SmartCrusher vs Kompress-Base: JSON and Text Compression in Headroom

SmartCrusher uses deterministic statistical reduction written in Rust to compress JSON arrays at sub-millisecond speed, while Kompress-Base employs a ModernBERT token-classification model via ONNX to shrink arbitrary plain text, trading inference latency for higher compression on unstructured content.

The chopratejas/headroom repository provides two distinct compression strategies for minimizing token volume before LLM inference. While both integrate with Headroom’s CCR (Compress-Cache-Retrieve) and TOIN (Telemetry-Optimized-Information-Network) systems, they target different data structures and employ fundamentally opposing architectures—one relying on native Rust statistics and the other on learned neural compression.

Core Architecture and Algorithm

SmartCrusher for JSON Arrays

In headroom/transforms/smart_crusher.py, SmartCrusher wraps a Rust-native implementation (headroom._core.SmartCrusher) that performs statistical-driven reduction on JSON-style arrays returned by tool calls. The algorithm preserves first and last items, errors, outliers, and relevance-scored entries while sampling the remainder to maintain representativeness. This deterministic approach achieves 70%–95% compression on typical tool outputs like search results or logs.

Kompress-Base for Plain Text

Located in headroom/transforms/kompress_compressor.py, Kompress-base utilizes a ModernBERT token-classification model trained on token-level compression tasks. It runs inference through ONNX (or optional PyTorch/CoreML) to classify and remove redundant tokens from arbitrary strings, paragraphs, or markdown. This ML-based approach delivers 80%–95% reduction on general text but requires the [ml] extra and approximately 50 MB–200 MB of model weights.

Performance and Resource Requirements

SmartCrusher executes in approximately 1 ms per array because processing occurs entirely in compiled Rust with no ML library overhead. Kompress-base requires 50 ms–200 ms per block due to neural inference and ONNX session initialization, making it noticeably slower but suitable for final-stage compression of remaining plain-text payloads.

Installation footprints diverge significantly. SmartCrusher adds roughly 5 MB as a compiled Python extension. Kompress-base requires pip install "headroom-ai[ml]" to pull onnxruntime and download the chopratejas/kompress-base model (referenced at line 39 via HF_MODEL_ID), resulting in a larger dependency tree.

Configuration API and Fallback Behavior

SmartCrusher exposes explicit control through the SmartCrusherConfig dataclass, allowing fine-grained tuning of min_items_to_analyze, max_items_after_crush, and preserve_change_points flags. If the Rust extension fails to load, the transform becomes unavailable as a hard import failure.

Kompress-base relies on environment variables such as HEADROOM_KOMPRESS_BACKEND and optional constructor arguments like batch_size rather than a dedicated configuration class. It implements graceful degradation: if the model cannot be loaded or an error occurs during inference, the compressor returns the original text unchanged in passthrough mode.

Pipeline Integration and Telemetry

Both compressors integrate with Headroom’s caching and telemetry layers but handle them differently. SmartCrusher emits sentinel objects ({"_ccr_dropped": "..."}) directly within the Rust core, which are later stripped by strip_ccr_sentinels. It also calls toin.record_compression() after real compression events (see lines 23–26 of smart_crusher.py).

Kompress-base generates TOIN signatures via _kompress_content_signature (implemented in lines 50–84 of kompress_compressor.py) and handles CCR markers at the pipeline level rather than injecting sentinels into the compressed text itself.

In the default Headroom pipeline architecture, SmartCrusher operates at Stage 3 (immediately before the Context Manager) to shrink structured tool outputs, while Kompress-base runs optionally at Stage 4 as a final ML-driven layer for unstructured text reduction.

Practical Usage Examples

Compressing JSON Arrays with SmartCrusher

from headroom import SmartCrusher, SmartCrusherConfig

cfg = SmartCrusherConfig(
    min_items_to_analyze=5,
    max_items_after_crush=15,
    preserve_change_points=True,
)

crusher = SmartCrusher(config=cfg)

# Example: Compress a large array of 1,000 search results

tool_output = {"results": [{"title": f"Result {i}", "score": i} for i in range(1_000)]}
compressed = crusher.crush(tool_output, query="best restaurants in NYC")

print(compressed.was_modified)  # True

# The returned JSON retains only ~15 representative items

Compressing Plain Text with Kompress-Base

from headroom.transforms.kompress_compressor import KompressCompressor

# Auto-downloads chopratejas/kompress-base on first use

compressor = KompressCompressor()

long_text = "Lorem ipsum..." * 1000  # Large text block or log output

result = compressor.compress(long_text)

print(result.compressed)  # Shortened version with 80-95% fewer tokens

print(result.original == long_text)  # False (compression applied)

Summary

  • SmartCrusher targets JSON arrays using deterministic Rust-based statistical trimming, offering sub-millisecond latency and minimal dependencies.
  • Kompress-Base targets arbitrary plain text using ModernBERT token classification via ONNX, providing higher compression ratios at the cost of inference latency and larger installation footprint.
  • SmartCrusher uses SmartCrusherConfig for explicit control and fails hard on import errors; Kompress-base uses environment variables and fails gracefully with passthrough.
  • Both support Headroom’s CCR caching and TOIN telemetry, but SmartCrusher injects sentinel objects while Kompress-base handles CCR at the pipeline level.
  • In the default pipeline, SmartCrusher runs at Stage 3 (structured data), while Kompress-base operates optionally at Stage 4 (final text compression).

Frequently Asked Questions

Can I use SmartCrusher on non-JSON text?

No. SmartCrusher is specifically designed for JSON-style arrays and objects returned by tool calls, as implemented in the Rust core. For arbitrary plain text, markdown, or log files, you should use Kompress-base or the standard Context Manager transforms.

Why is Kompress-base slower than SmartCrusher?

Kompress-base performs neural inference using a ModernBERT model through ONNX runtime, which requires loading model weights and running token-level classification for each block. SmartCrusher executes purely in compiled Rust code using statistical heuristics, requiring no model loading or GPU acceleration, resulting in approximately 1 ms versus 50–200 ms per operation.

What happens if the Kompress-base model fails to download?

The compressor falls back to passthrough mode, returning the original text unchanged without raising an exception. This ensures pipeline stability even when ML dependencies are unavailable, though you should monitor TOIN telemetry signatures to detect when compression is skipped.

Do I need both compressors in my Headroom pipeline?

No. SmartCrusher is included by default and handles structured JSON reduction at Stage 3. Kompress-base is optional and only necessary if you require additional compression on unstructured text content after the Context Manager stage. Most pipelines benefit from SmartCrusher alone; add Kompress-base only when processing long-form text outputs that exceed token limits.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →