How TOIN Improves Compression Decisions in Headroom: A Technical Deep Dive

TOIN (Tool Output Intelligence Network) improves Headroom's compression by observing outcomes, learning cross-user patterns offline, and biasing importance scoring to preserve high-value tool outputs while maintaining deterministic request-time behavior.

Headroom is an intelligent context management system for tool outputs developed in the chopratejas/headroom repository. The Tool Output Intelligence Network (TOIN) improves compression decisions by transforming static heuristics into a data-driven feedback loop that continuously refines which messages to keep and which to drop. By aggregating compression outcomes across tenants, models, and tool types, TOIN enables the system to make smarter decisions without sacrificing request-time determinism.

The Three-Phase TOIN Pipeline

TOIN operates alongside Headroom’s compression pipeline—which includes SmartCrusher, Kompress, Code-Aware compressors, and the ContentRouter—through a strict observation-only contract (PR-B5) that ensures deterministic behavior.

Observation Phase

Every time a compressor finishes processing, it calls record_compression on the global TOIN singleton. This happens in headroom/transforms/smart_crusher.py::_record_to_toin and headroom/transforms/content_router.py::_record_to_toin. The recorded datum contains the tool signature hash, original versus compressed token counts, and the chosen compression strategy.

Learning Phase

TOIN aggregates these records per-tenant (auth_mode), per-model (model_family), and per-tool (structure_hash). The aggregation logic lives in headroom/telemetry/toin.py, specifically within the ToolPattern dataclass and the record_compression method. It builds field-level statistics including retrieval rate, error-indicator frequency, and semantic type inference to represent the importance of different output parts.

Recommendation Phase

An offline CLI (headroom.cli.toin_publish) reads the stored TOIN JSON, computes aggregated statistics, and writes a recommendations.toml file. The Rust proxy loads this configuration at startup, and IntelligentContextManager reads the toin_importance and error_indicator weights to bias the multi-factor scoring used by ContentRouter when deciding which messages to drop.

Key Benefits of Intelligence-Driven Compression

The observation-only contract ensures TOIN never mutates a running request, keeping the pipeline deterministic while delivering three concrete advantages:

  • Cross-user pattern learning: When many users retain the same type of tool output, TOIN flags that pattern as high-importance. The optimal_keep_ratio in the recommendations file reflects this collective behavior, causing future compressions to preserve similar outputs.

  • Error-indicator awareness: TOIN learns which fields are consistently marked as errors via field_semantics.inferred_type == "error_indicator". These fields receive elevated preservation scores in the recommendations file, ensuring critical error information survives aggressive compression.

  • Retrieval-feedback loop: If a dropped message is later retrieved through CCR (Content Compression Recovery), TOIN records a retrieval event. This feedback boosts the importance score for similar patterns in subsequent compressions, creating a self-correcting system.

Enabling and Configuring TOIN

To activate TOIN in your Headroom instance, instantiate it with TOINConfig:

from headroom import Headroom, SmartCrusherConfig
from headroom.telemetry.toin import TOINConfig

headroom = Headroom(
    smart_crusher_config=SmartCrusherConfig(),
    toin=TOINConfig(enabled=True),
)

TOIN records compression events automatically. SmartCrusher logs these interactions, which appear in your logs as:


# INFO:headroom.transforms.smart_crusher:SmartCrusher: kept 15 of 100 items

# DEBUG:headroom.telemetry.toin: TOIN recording succeeded:

#   tool=search, original=2000, kept=150, strategy=smart_crusher

To generate recommendations based on accumulated telemetry, run the offline CLI:

python -m headroom.cli.toin_publish \
    --output recommendations.toml \
    --min-observations 50

The resulting TOML contains aggregated statistics that the Rust proxy consumes:

[tool_patterns."payg"."gpt-4o"."a1b2c3d4"]
total_compressions = 1245
optimal_keep_ratio = 0.42
error_indicator = 0.15

Accessing TOIN Data Programmatically

For debugging or custom analysis, access the TOIN singleton directly:

from headroom.telemetry.toin import get_toin

toin = get_toin()
stats = toin.stats()
print(f"Recorded {stats.total_compressions} compressions across {len(stats.patterns)} patterns")

Summary

TOIN transforms Headroom from a static compression system into a continuously improving intelligence network:

  • Observation hooks in headroom/transforms/smart_crusher.py and headroom/transforms/content_router.py capture every compression outcome without impacting request latency.
  • Offline learning aggregates patterns across tenants and models to calculate optimal keep ratios and error-field importance.
  • Recommendation files provide the Rust proxy with bias weights that improve drop decisions while maintaining deterministic behavior.
  • Retrieval feedback from CCR closes the learning loop, ensuring rarely-needed data becomes harder to drop over time.

Frequently Asked Questions

Does enabling TOIN affect the determinism of compression results?

No. TOIN follows an observation-only contract (PR-B5) that prevents it from mutating running requests. The compression pipeline remains deterministic—the same input always produces the same output—because learning happens offline and recommendations are loaded as static configuration at startup.

How does TOIN know which fields contain errors?

TOIN tracks field_semantics.inferred_type statistics across compression events. When it observes fields consistently marked as error_indicator, it assigns higher importance scores to those fields in the recommendations.toml file, ensuring the Rust proxy preserves error information during aggressive compression.

Where is TOIN data stored and how is it secured?

By default, TOIN uses a filesystem backend defined in headroom/telemetry/backends/file.py that writes to toin.json. Data is aggregated per-tenant using auth_mode identifiers, ensuring multi-tenant isolation. You can verify storage location and permissions through the TOINConfig backend settings.

Can I use TOIN recommendations without running the offline CLI continuously?

Yes. The headroom.cli.toin_publish CLI is designed to run periodically (e.g., daily) to generate updated recommendations.toml files. The Rust proxy loads these files at startup and does not require the CLI to be running during request processing, making TOIN suitable for air-gapped or batch-learned deployments.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →