# How TOIN Improves Compression Decisions in Headroom: A Technical Deep Dive

> Discover how TOIN enhances Headroom compression by learning user patterns and preserving high-value outputs. Learn more about this technical deep dive.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: deep-dive
- Published: 2026-06-08

---

**TOIN (Tool Output Intelligence Network) improves Headroom's compression by observing outcomes, learning cross-user patterns offline, and biasing importance scoring to preserve high-value tool outputs while maintaining deterministic request-time behavior.**

Headroom is an intelligent context management system for tool outputs developed in the `chopratejas/headroom` repository. The Tool Output Intelligence Network (TOIN) improves compression decisions by transforming static heuristics into a data-driven feedback loop that continuously refines which messages to keep and which to drop. By aggregating compression outcomes across tenants, models, and tool types, TOIN enables the system to make smarter decisions without sacrificing request-time determinism.

## The Three-Phase TOIN Pipeline

TOIN operates alongside Headroom’s compression pipeline—which includes **SmartCrusher**, **Kompress**, **Code-Aware** compressors, and the **ContentRouter**—through a strict observation-only contract (PR-B5) that ensures deterministic behavior.

### Observation Phase

Every time a compressor finishes processing, it calls `record_compression` on the global TOIN singleton. This happens in `headroom/transforms/smart_crusher.py::_record_to_toin` and `headroom/transforms/content_router.py::_record_to_toin`. The recorded datum contains the tool signature hash, original versus compressed token counts, and the chosen compression strategy.

### Learning Phase

TOIN aggregates these records per-tenant (`auth_mode`), per-model (`model_family`), and per-tool (`structure_hash`). The aggregation logic lives in [`headroom/telemetry/toin.py`](https://github.com/chopratejas/headroom/blob/main/headroom/telemetry/toin.py), specifically within the `ToolPattern` dataclass and the `record_compression` method. It builds field-level statistics including retrieval rate, error-indicator frequency, and semantic type inference to represent the importance of different output parts.

### Recommendation Phase

An offline CLI (`headroom.cli.toin_publish`) reads the stored TOIN JSON, computes aggregated statistics, and writes a [`recommendations.toml`](https://github.com/chopratejas/headroom/blob/main/recommendations.toml) file. The Rust proxy loads this configuration at startup, and `IntelligentContextManager` reads the `toin_importance` and `error_indicator` weights to bias the multi-factor scoring used by `ContentRouter` when deciding which messages to drop.

## Key Benefits of Intelligence-Driven Compression

The observation-only contract ensures TOIN never mutates a running request, keeping the pipeline deterministic while delivering three concrete advantages:

- **Cross-user pattern learning**: When many users retain the same type of tool output, TOIN flags that pattern as high-importance. The `optimal_keep_ratio` in the recommendations file reflects this collective behavior, causing future compressions to preserve similar outputs.

- **Error-indicator awareness**: TOIN learns which fields are consistently marked as errors via `field_semantics.inferred_type == "error_indicator"`. These fields receive elevated preservation scores in the recommendations file, ensuring critical error information survives aggressive compression.

- **Retrieval-feedback loop**: If a dropped message is later retrieved through CCR (Content Compression Recovery), TOIN records a retrieval event. This feedback boosts the importance score for similar patterns in subsequent compressions, creating a self-correcting system.

## Enabling and Configuring TOIN

To activate TOIN in your Headroom instance, instantiate it with `TOINConfig`:

```python
from headroom import Headroom, SmartCrusherConfig
from headroom.telemetry.toin import TOINConfig

headroom = Headroom(
    smart_crusher_config=SmartCrusherConfig(),
    toin=TOINConfig(enabled=True),
)

```

TOIN records compression events automatically. SmartCrusher logs these interactions, which appear in your logs as:

```python

# INFO:headroom.transforms.smart_crusher:SmartCrusher: kept 15 of 100 items

# DEBUG:headroom.telemetry.toin: TOIN recording succeeded:

#   tool=search, original=2000, kept=150, strategy=smart_crusher

```

To generate recommendations based on accumulated telemetry, run the offline CLI:

```bash
python -m headroom.cli.toin_publish \
    --output recommendations.toml \
    --min-observations 50

```

The resulting TOML contains aggregated statistics that the Rust proxy consumes:

```toml
[tool_patterns."payg"."gpt-4o"."a1b2c3d4"]
total_compressions = 1245
optimal_keep_ratio = 0.42
error_indicator = 0.15

```

## Accessing TOIN Data Programmatically

For debugging or custom analysis, access the TOIN singleton directly:

```python
from headroom.telemetry.toin import get_toin

toin = get_toin()
stats = toin.stats()
print(f"Recorded {stats.total_compressions} compressions across {len(stats.patterns)} patterns")

```

## Summary

TOIN transforms Headroom from a static compression system into a continuously improving intelligence network:

- **Observation hooks** in [`headroom/transforms/smart_crusher.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/smart_crusher.py) and [`headroom/transforms/content_router.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/content_router.py) capture every compression outcome without impacting request latency.
- **Offline learning** aggregates patterns across tenants and models to calculate optimal keep ratios and error-field importance.
- **Recommendation files** provide the Rust proxy with bias weights that improve drop decisions while maintaining deterministic behavior.
- **Retrieval feedback** from CCR closes the learning loop, ensuring rarely-needed data becomes harder to drop over time.

## Frequently Asked Questions

### Does enabling TOIN affect the determinism of compression results?

No. TOIN follows an observation-only contract (PR-B5) that prevents it from mutating running requests. The compression pipeline remains deterministic—the same input always produces the same output—because learning happens offline and recommendations are loaded as static configuration at startup.

### How does TOIN know which fields contain errors?

TOIN tracks `field_semantics.inferred_type` statistics across compression events. When it observes fields consistently marked as `error_indicator`, it assigns higher importance scores to those fields in the [`recommendations.toml`](https://github.com/chopratejas/headroom/blob/main/recommendations.toml) file, ensuring the Rust proxy preserves error information during aggressive compression.

### Where is TOIN data stored and how is it secured?

By default, TOIN uses a filesystem backend defined in [`headroom/telemetry/backends/file.py`](https://github.com/chopratejas/headroom/blob/main/headroom/telemetry/backends/file.py) that writes to [`toin.json`](https://github.com/chopratejas/headroom/blob/main/toin.json). Data is aggregated per-tenant using `auth_mode` identifiers, ensuring multi-tenant isolation. You can verify storage location and permissions through the `TOINConfig` backend settings.

### Can I use TOIN recommendations without running the offline CLI continuously?

Yes. The `headroom.cli.toin_publish` CLI is designed to run periodically (e.g., daily) to generate updated [`recommendations.toml`](https://github.com/chopratejas/headroom/blob/main/recommendations.toml) files. The Rust proxy loads these files at startup and does not require the CLI to be running during request processing, making TOIN suitable for air-gapped or batch-learned deployments.