How to Configure SmartCrusher for JSON Array Compression vs Traditional Compression in Headroom

SmartCrusher preserves structurally critical items in JSON arrays through configurable change-point detection, field matching, and key-order guarantees, unlike traditional compression algorithms that indiscriminately remove data based solely on statistical frequency.

SmartCrusher is the default JSON-array compressor in the Headroom framework, designed specifically for LLM contexts where aggressive token reduction must not sacrifice semantically relevant data. While traditional compression treats all array elements uniformly, SmartCrusher implements semantic-aware preservation rules that protect headers, footers, and domain-specific fields. This guide explains how to configure these preservation mechanisms through the Rust core and available language SDKs.

Understanding SmartCrusher Preservation Mechanisms

SmartCrusher operates through three complementary preservation strategies defined in crates/headroom-core/src/transforms/smart_crusher/config.rs. These mechanisms function as additive filters—any item satisfying at least one condition is exempt from removal.

Change-Point Preservation

The preserve_change_points option guarantees that the first and last N items of an array remain intact, protecting critical header and footer information such as timestamps, request IDs, and error messages. When enabled (default: true), the planner in planning.rs automatically marks these boundary items as "anchors" that the crusher cannot remove regardless of compression aggressiveness.

Field-Based Preservation

The preserve_fields configuration accepts a list of field names (or SHA-256-derived hashes truncated to 8 bytes) that must be retained if any query token matches the field value. In planning.rs, the function item_has_preserve_field_match checks each array element against these hashed field names, marking matches as preservation anchors. This ensures that objects containing specific user IDs, request IDs, or other domain keys survive the compression process.

Key-Order Preservation

When processing dictionary objects, the preserve_keys option maintains the insertion order of specified keys and prevents the removal of entire key-value pairs. The underlying implementation uses serde_json::preserve_order as implemented in crusher.rs, ensuring deterministic round-trips for downstream tooling that expects stable key ordering.

Configuration Architecture and Source Files

The configuration structure is defined in crates/headroom-core/src/transforms/smart_crusher/config.rs as SmartCrusherConfig, which is injected into the SmartCrusherPlanner constructor and subsequently passed to the crusher implementation.

Key implementation files include:

  • config.rs: Defines SmartCrusherConfig with boolean and vector fields for the three preservation mechanisms.
  • planning.rs: Implements the anchor detection logic around line 17, determining which items are eligible for removal based on the preservation rules.
  • crusher.rs: Executes the actual removal while respecting preservation flags, with the array order guarantee documented at line 109.
  • sdk/typescript/src/types/config.ts: Exposes TypeScript type definitions that map directly to the Rust configuration struct.

Practical Configuration Examples

Python SDK Configuration

When using the Python SDK, instantiate HeadroomClient with preservation parameters that forward to the Rust core:

from headroom import HeadroomClient

client = HeadroomClient(
    preserve_change_points=True,  # Keep first/last 10 items

    preserve_fields=["user_id", "request_id"],  # Domain-specific retention

)

messages = [
    {"role": "system", "content": "Header context"},
    # ... hundreds of tool outputs ...

    {"role": "system", "content": "Footer context"},
]

result = client.compress(messages)
print(f"Reduced tokens: {result.tokens_before}{result.tokens_after}")

TypeScript SDK Configuration

The TypeScript SDK exposes identical options through the HeadroomConfig interface:

import { HeadroomClient } from "headroom-ai";

const client = new HeadroomClient({
  preserve_change_points: true,
  preserve_keys: ["session_id"],  // Maintain key order
  preserve_fields: ["order_id", "invoice_id"],
});

const messages = [
  { role: "assistant", content: "..." },
  // ... large array ...
];

const result = await client.compress(messages);
console.log(`Compression ratio: ${result.tokensAfter / result.tokensBefore}`);

Low-Level Direct Configuration

For scenarios requiring direct control without the client abstraction, instantiate SmartCrusherConfig explicitly:

from headroom import compress, SmartCrusherConfig

config = SmartCrusherConfig(
    preserve_change_points=True,
    preserve_fields=["trace_id"],
    preserve_keys=["timestamp"],
)

payload = [...]  # Your JSON array

compressed = compress(payload, config=config)
print(f"Achieved ratio: {compressed.compression_ratio}")

SmartCrusher vs Traditional Compression

Traditional compression algorithms (such as GZIP or standard JSON minifiers) analyze byte frequency and redundancy, removing whitespace and repetitive patterns without understanding data semantics. SmartCrusher differs fundamentally by operating at the semantic level:

  • Structural awareness: Traditional methods cannot distinguish between a critical log header and redundant debug output; SmartCrusher preserves change-points and specified fields regardless of their content frequency.
  • Query-aware retention: Unlike traditional compression, SmartCrusher uses preserve_fields to maintain items relevant to the current conversation context, matching field values against active query tokens.
  • Deterministic ordering: While traditional compression may reorder or collapse objects, SmartCrusher's preserve_keys ensures specific fields remain in their original positions using serde_json::preserve_order.

Summary

  • SmartCrusher is Headroom's default JSON-array compressor that uses semantic preservation rules rather than statistical compression.
  • Configure preservation through three mechanisms: preserve_change_points (boolean), preserve_fields (string list), and preserve_keys (string list).
  • The configuration struct resides in config.rs and is processed by planning.rs (anchor detection) and crusher.rs (execution).
  • Both Python and TypeScript SDKs expose these options directly, mapping to the underlying Rust implementation.
  • Enable preserve_change_points to protect array boundaries, preserve_fields to retain domain-specific objects, and preserve_keys to maintain dictionary key ordering.

Frequently Asked Questions

What is the difference between SmartCrusher and traditional JSON compression?

Traditional compression algorithms like GZIP or standard minifiers remove whitespace and reduce redundancy based on statistical patterns, treating all data uniformly. SmartCrusher operates at the semantic level, using preserve_change_points to protect structural boundaries and preserve_fields to retain items containing specific domain keys, ensuring critical data survives aggressive token reduction.

How does the preserve_fields option work internally?

The preserve_fields option hashes each specified field name using SHA-256 (truncated to 8 bytes) and stores these hashes in preserve_field_hashes. During the planning stage in planning.rs, the function item_has_preserve_field_match checks if any array element contains a value matching a query token against these hashes, marking matches as preservation anchors that crusher.rs will not remove.

Can I use SmartCrusher without the Headroom client?

Yes, you can instantiate SmartCrusherConfig directly and pass it to the compress function. This low-level approach bypasses the HeadroomClient abstraction and is useful when integrating the compression logic into existing data pipelines or when processing JSON payloads that are not part of a standard message flow.

What happens if multiple preservation rules conflict?

The preservation mechanisms are additive rather than exclusive. If an array item satisfies any single preservation condition—whether through change-point detection, field matching, or key-order requirements—it is automatically exempt from removal. The crusher in crusher.rs processes the union of all anchor sets generated by the planner.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →