# TOIN (Tool Output Intelligent Network) in Headroom: Learning Compression Patterns Explained

> Discover TOIN (Tool Output Intelligent Network) in Headroom. Learn how this privacy-preserving system identifies compression patterns to optimize future strategies. Understand its role in telemetry and recommendation generation.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: deep-dive
- Published: 2026-06-07

---

**TOIN (Tool Output Intelligent Network) is a privacy-preserving, observation-only telemetry subsystem that records compression tool behavior across user requests to learn statistical patterns and generate offline recommendations for optimizing future compression strategies.**

Headroom is an open-source proxy system designed to optimize LLM tool outputs through intelligent compression. At its core sits **TOIN (Tool Output Intelligent Network)**, a sophisticated learning engine that analyzes real-world usage without compromising request-time performance or user privacy. This system continuously observes how compression tools behave across different contexts to refine and improve compression decisions over time.

## What is TOIN?

TOIN stands for **Tool Output Intelligent Network**, though the codebase occasionally refers to it as **Tool Output Intelligence Network**. It functions as the learning layer of the Headroom architecture, sitting alongside compression tools like SmartCrusher to build a feedback loop between compression actions and retrieval outcomes.

Unlike systems that modify behavior in real-time, TOIN operates under a strict **observation-only contract**. During request processing, calls such as `record_compression` only log metadata about what happened; they never alter the compression path or inject latency. The actual learning happens offline through the `headroom.cli.toin_publish` command, which emits a [`recommendations.toml`](https://github.com/chopratejas/headroom/blob/main/recommendations.toml) file that the Rust proxy reads at startup.

## How TOIN Learns Compression Patterns

The learning mechanism relies on aggregating events by a specific tuple key defined in the `_make_pattern_key` helper within [`headroom/telemetry/toin.py`](https://github.com/chopratejas/headroom/blob/main/headroom/telemetry/toin.py). Patterns are keyed by **`(auth_mode, model_family, tool_signature_hash)`**, enabling each tenant-slice and model family to learn independently.

The learning loop consists of three primary operations:

- **`record_compression`** – Updates per-tool statistics including compression counts, token ratios, and strategy success rates
- **`record_retrieval`** – Signals that compression was too aggressive (requiring re-fetch), updates retrieval counters, and enriches field-level semantics
- **`_update_recommendations`** – Periodically derives optimal max-items, skip-compression flags, preserve-fields lists, and the best compression strategy based on accumulated statistics

Confidence in recommendations grows with sample size and the number of distinct users tracked, calculated via the `_calculate_confidence` method.

## Core Architecture and Data Flow

The TOIN implementation spans multiple files in the `headroom/telemetry/` module:

- **[`headroom/telemetry/toin.py`](https://github.com/chopratejas/headroom/blob/main/headroom/telemetry/toin.py)** – Core implementation containing recording methods, aggregation logic, and serialization
- **[`headroom/telemetry/models.py`](https://github.com/chopratejas/headroom/blob/main/headroom/telemetry/models.py)** – Data structures including `ToolSignature` and `FieldSemantics`
- **[`headroom/telemetry/backends.py`](https://github.com/chopratejas/headroom/blob/main/headroom/telemetry/backends.py)** – Storage backends; the default `FileSystemTOINBackend` writes to [`toin.json`](https://github.com/chopratejas/headroom/blob/main/toin.json)
- **[`headroom/cli/toin_publish.py`](https://github.com/chopratejas/headroom/blob/main/headroom/cli/toin_publish.py)** – Offline CLI that loads persisted JSON, aggregates patterns, and writes [`recommendations.toml`](https://github.com/chopratejas/headroom/blob/main/recommendations.toml)

The data flow follows an offline-first architecture. Compression events and retrieval feedback accumulate in the JSON store through the request-time API. Operators periodically run the publish command to generate TOML recommendations. The proxy loads these recommendations at startup, ensuring zero per-request overhead from the learning system.

## Privacy and Scalability Safeguards

TOIN implements strict privacy controls by design. The system **never stores actual data values**—only structure hashes, field-name hashes, and anonymized query patterns. This ensures that sensitive user information remains within the operator's infrastructure while still allowing pattern recognition.

Scalability safeguards prevent unbounded growth:

- **Instance hash caps** – Limited to 100 stored instance hashes per pattern
- **Duplicate detection bounds** – `MAX_SEEN_INSTANCES = 10,000` provides a hard limit for the deduplication set
- **Frequency-based pruning** – Query-pattern and field-retrieval dictionaries are pruned based on usage frequency

Operators can integrate external monitoring through a configurable `metrics_callback` that emits events like `toin.compression` and `toin.retrieval` to Prometheus or similar systems.

## Implementation Examples

### Recording Compression Events

The `record_compression` method in [`headroom/telemetry/toin.py`](https://github.com/chopratejas/headroom/blob/main/headroom/telemetry/toin.py) (lines 521-602) captures request-time metadata without blocking the compression pipeline:

```python
from headroom.telemetry.toin import get_toin
from headroom.transforms.smart_crusher import ToolSignature

# Assuming `signature` is a ToolSignature instance

get_toin().record_compression(
    tool_signature=signature,
    original_count=len(items),
    compressed_count=kept,
    original_tokens=before,
    compressed_tokens=after,
    strategy="smart_crusher",
    query_context="status:error AND user:alice",
    items=items,
    auth_mode="payg",
    model_family="claude-3-5",
)

```

### Recording Retrieval Feedback

When compression proves too aggressive and requires re-fetching, `record_retrieval` (lines 788-877 in [`headroom/telemetry/toin.py`](https://github.com/chopratejas/headroom/blob/main/headroom/telemetry/toin.py)) updates the statistical model:

```python
from headroom.telemetry.toin import get_toin

get_toin().record_retrieval(
    tool_signature_hash=signature.structure_hash,
    retrieval_type="full",
    query="status:error",
    query_fields=["status"],
    strategy="smart_crusher",
    retrieved_items=retrieved,
    auth_mode="payg",
    model_family="claude-3-5",
)

```

### Accessing Aggregated Statistics

Monitor system-wide performance through the `get_stats` method (around line 1220 in [`headroom/telemetry/toin.py`](https://github.com/chopratejas/headroom/blob/main/headroom/telemetry/toin.py)):

```python
stats = get_toin().get_stats()
print(stats["global_retrieval_rate"])

```

### Exporting Patterns for Offline Processing

Export accumulated patterns for the CLI publisher:

```python
from headroom.telemetry.toin import get_toin

patterns = get_toin().iter_patterns()  # Returns list of (key, ToolPattern) tuples

```

## Summary

- **TOIN (Tool Output Intelligent Network)** is Headroom's privacy-preserving telemetry layer that learns optimal compression strategies through observation rather than real-time mutation
- The system aggregates patterns by **`(auth_mode, model_family, tool_signature_hash)`**, enabling isolated learning across tenant and model boundaries
- **Observation-only contract** ensures zero latency impact on request-time compression paths; recommendations are generated offline via `headroom.cli.toin_publish`
- **Privacy safeguards** store only hashes and anonymized patterns, never actual data values
- **Scalability limits** including `MAX_SEEN_INSTANCES = 10,000` and frequency-based pruning prevent resource exhaustion

## Frequently Asked Questions

### How does TOIN protect user privacy?

TOIN never persists actual data values, field contents, or raw queries. Instead, it stores structure hashes, field-name hashes, and anonymized query patterns. This design ensures that sensitive information remains within your infrastructure while still allowing the system to learn which fields are frequently retrieved and which compression strategies work best for specific tool signatures.

### What is the observation-only contract?

The observation-only contract means that TOIN methods like `record_compression` and `record_retrieval` operate as pure side-effects—they log telemetry data without modifying the compression decision path or adding latency to the request. The actual learning and recommendation generation happens offline through the `toin_publish` CLI, which writes a [`recommendations.toml`](https://github.com/chopratejas/headroom/blob/main/recommendations.toml) file consumed by the proxy at startup.

### How are compression recommendations generated?

Recommendations are derived through the `_update_recommendations` method, which analyzes aggregated statistics from [`toin.json`](https://github.com/chopratejas/headroom/blob/main/toin.json) to calculate confidence scores based on sample size and user diversity. The system determines optimal max-items thresholds, identifies fields that should be preserved, flags situations where compression should be skipped entirely, and selects the best compression strategy for each tool signature pattern.

### What are the scalability limits of TOIN?

TOIN enforces hard caps to prevent unbounded memory growth: it stores a maximum of 100 instance hashes per pattern and maintains a deduplication set limited to `MAX_SEEN_INSTANCES = 10,000`. Additionally, frequency-based pruning removes rarely-used query patterns and field-retrieval entries, ensuring the telemetry store remains bounded even under high-volume production loads.