# What is TOIN and How Does It Learn Compression Patterns?

> Discover TOIN, an intelligence network that learns LLM context compression patterns through observation and closed-loop feedback. Optimize your LLM performance now.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: deep-dive
- Published: 2026-06-06

---

**TOIN (Tool Output Intelligence Network) is an observation‑only learning layer that records every compression event and subsequent retrieval to automatically optimize LLM context compression through closed‑loop feedback, without storing any raw user values.**

TOIN powers the intelligent context compression system in the `chopratejas/headroom` open‑source repository. This Python‑based telemetry engine runs inside the application process to observe how tool outputs are compressed and whether the LLM later requests the dropped content, continuously refining which data patterns are safe to truncate and which must be preserved.

## Core Architecture Components

### The ToolIntelligenceNetwork Singleton

At the heart of TOIN lies the **`ToolIntelligenceNetwork`** class, implemented in [[`headroom/telemetry/toin.py`](https://github.com/chopratejas/headroom/blob/main/headroom/telemetry/toin.py)](https://github.com/chopratejas/headroom/blob/main/headroom/telemetry/toin.py#L414-L428). This thread‑safe singleton stores all learned data and is accessed via the `get_toin()` factory function. It maintains a registry of **`ToolPattern`** objects, each representing a unique compression signature for a specific tool and tenant slice.

### Aggregation Keys and Tenant Isolation

TOIN isolates learning per tenant and model family using a composite aggregation key generated by [`_make_pattern_key`](https://github.com/chopratejas/headroom/blob/main/headroom/telemetry/toin.py#L13-L28). The key structure is `(auth_mode, model_family, tool_signature_hash)`, ensuring that compression strategies learned for one model family (e.g., GPT‑4o) do not leak into another (e.g., Claude), and that pay‑as‑you‑go tenants remain isolated from enterprise accounts.

### Privacy‑First Design

The system guarantees privacy by storing **only cryptographic hashes** of field names, values, and query patterns. No raw user content, identifiers, or plaintext values are retained, as documented in the module docstring (lines 30‑35 of [`toin.py`](https://github.com/chopratejas/headroom/blob/main/toin.py)). This design allows TOIN to learn statistical patterns without exposing sensitive data.

## How TOIN Learns Compression Patterns

TOIN implements a six‑stage feedback loop that replaces hard‑coded heuristics with empirical observation.

### Stage 1: Recording Compression Events

When the **`SmartCrusher`** (located in [`headroom/transforms/smart_crusher.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/smart_crusher.py)) drops items from a tool response, it calls [`record_compression`](https://github.com/chopratejas/headroom/blob/main/headroom/telemetry/toin.py#L521-L582). This method:

- Stores the tool’s `structure_hash`, original versus compressed item counts, and the **compression ratio**.
- Updates strategy success counters for the specific compression algorithm used.
- Invokes [`_update_field_statistics`](https://github.com/chopratejas/headroom/blob/main/headroom/telemetry/toin.py#L704-L748) to hash field values and track unique value counts and most‑common values.

### Stage 2: Detecting Retrievals

If the LLM later asks for the content that was dropped, the content router calls [`record_retrieval`](https://github.com/chopratejas/headroom/blob/main/headroom/telemetry/toin.py#L778-L842). This decreases the success score of the compression strategy that caused the drop and records which specific fields were retrieved, providing negative feedback for the learning algorithm.

### Stage 3: Field‑Level Semantic Inference

The **`FieldSemantics`** class (defined in [`headroom/models.py`](https://github.com/chopratejas/headroom/blob/main/headroom/models.py)) aggregates statistics about each field. After sufficient observations, [`infer_type()`](https://github.com/chopratejas/headroom/blob/main/headroom/telemetry/toin.py#L904-L936) is triggered to classify fields into semantic categories such as `error_indicator`, `identifier`, or `timestamp`. These inferred types replace hard‑coded heuristics and determine which fields receive **preserve‑field** hints in future compressions.

### Stage 4: Query Pattern Anonymization

TOIN learns which query patterns correlate with retrievals by stripping concrete values via [`_anonymize_query_pattern`](https://github.com/chopratejas/headroom/blob/main/headroom/telemetry/toin.py#L444-L462). For example, `status:error AND user:john` is normalized to `status:* AND user:*`, allowing the system to track that queries matching this abstract pattern frequently require full context, regardless of the specific user.

### Stage 5: Aggregating Signals

Each **`ToolPattern`** maintains cumulative counters inside the singleton:
- Total compressions and retrievals per strategy.
- Field‑retrieval frequencies (which fields are most often recalled).
- Query‑pattern frequencies.

These aggregates form the empirical basis for confidence scoring.

### Stage 6: Producing Offline Recommendations

A periodic offline job runs via the CLI command `python -m headroom.cli.toin_publish`. This invokes [`_update_recommendations`](https://github.com/chopratejas/headroom/blob/main/headroom/telemetry/toin.py#L888-L926) for every pattern accessible via [`iter_patterns`](https://github.com/chopratejas/headroom/blob/main/headroom/telemetry/toin.py#L666-L677). The logic computes:
- **Retrieval‑rate thresholds** that determine when compression is too aggressive.
- **`optimal_max_items`** based on the point of diminishing returns (where dropping more items triggers excessive retrievals).
- **`preserve_fields`** lists derived from high‑retrieval frequency and semantic type inference.
- The **best compression strategy** (e.g., `smart_crusher` vs `truncate`) with the highest success rate.

The results are written to [`recommendations.toml`](https://github.com/chopratejas/headroom/blob/main/recommendations.toml), which the Rust proxy consumes at startup.

## Code Examples

### Recording a Compression Event

```python
from headroom.telemetry.toin import get_toin
from headroom.tools import ToolSignature

toin = get_toin()
toin.record_compression(
    tool_signature=ToolSignature(structure_hash="abc123…"),
    original_count=150,
    compressed_count=30,
    original_tokens=1200,
    compressed_tokens=200,
    strategy="smart_crusher",
    query_context="status:error AND user:john",
    items=[{"status": "error", "user": "john", "msg": "…"}],
    auth_mode="payg",
    model_family="gpt-4o",
)

```

This call updates the internal `ToolPattern` and emits a metric event (`toin.compression`).

### Recording a Retrieval Event

```python
toin.record_retrieval(
    tool_signature_hash="abc123…",
    retrieval_type="full",  # or "search"

    query="status:error AND user:john",
    query_fields=["status", "user"],
    strategy="smart_crusher",
    retrieved_items=[{"status": "error", "user": "john", "msg": "…"}],
    auth_mode="payg",
    model_family="gpt-4o",
)

```

This decreases the success rate for the `smart_crusher` strategy when applied to this pattern and enriches the field‑level semantics.

### Generating Recommendations Offline

```bash
python -m headroom.cli.toin_publish --output recommendations.toml

```

The CLI iterates over every stored pattern and writes the TOML file that the Rust proxy reads on its next restart.

## Integration with the Rust Proxy

The Rust proxy does not query TOIN at request time. Instead, it loads **[`recommendations.toml`](https://github.com/chopratejas/headroom/blob/main/recommendations.toml)** at startup and applies the per‑tool hints (e.g., `optimal_max_items`, `preserve_fields`) when deciding which messages to drop. The legacy `get_recommendation` API is deprecated and now only emits a warning, ensuring that learning happens offline and inference happens via static configuration.

## Summary

- **TOIN** is an observation‑only learning layer that optimizes LLM context compression through empirical feedback rather than static rules.
- It learns by recording compressions via `record_compression` and penalizing strategies when `record_retrieval` is called for dropped content.
- Learning is isolated per tenant and model family using hashed aggregation keys like `(auth_mode, model_family, tool_signature_hash)`.
- Field‑level semantics and anonymized query patterns allow TOIN to infer which data is critical (e.g., error indicators) without storing raw values.
- Recommendations are generated offline via `toin_publish` and consumed by the Rust proxy via [`recommendations.toml`](https://github.com/chopratejas/headroom/blob/main/recommendations.toml), ensuring zero per‑request latency overhead.

## Frequently Asked Questions

### What does TOIN stand for?

TOIN stands for **Tool Output Intelligence Network**. It is the specific subsystem within Headroom responsible for observing tool output compressions and retrievals to build optimized compression strategies.

### How does TOIN protect user privacy while learning patterns?

TOIN implements a hash‑only storage policy. It stores cryptographic hashes of field names, values, and query patterns rather than the raw data, and it never retains user identifiers. This allows the system to count unique values and detect frequent patterns statistically without exposing sensitive content.

### What triggers an update to a compression pattern?

A pattern updates when the feedback loop closes: first, when `record_compression` saves the initial drop statistics, and second, when `record_retrieval` indicates the LLM needed the dropped data. This retrieval event reduces the strategy’s success score and increments field‑retrieval counters, directly influencing the next generation of offline recommendations.

### How do TOIN recommendations reach the Rust proxy?

Recommendations flow through an offline batch process. The `headroom.cli.toin_publish` command iterates over all stored patterns using `iter_patterns`, runs `_update_recommendations` to compute optimal settings, and writes them to [`recommendations.toml`](https://github.com/chopratejas/headroom/blob/main/recommendations.toml). The Rust proxy reads this file at startup; there is no runtime API for fetching hints during request processing.