What is TOIN and How Does It Learn Compression Patterns?

TOIN (Tool Output Intelligence Network) is an observation‑only learning layer that records every compression event and subsequent retrieval to automatically optimize LLM context compression through closed‑loop feedback, without storing any raw user values.

TOIN powers the intelligent context compression system in the chopratejas/headroom open‑source repository. This Python‑based telemetry engine runs inside the application process to observe how tool outputs are compressed and whether the LLM later requests the dropped content, continuously refining which data patterns are safe to truncate and which must be preserved.

Core Architecture Components

The ToolIntelligenceNetwork Singleton

At the heart of TOIN lies the ToolIntelligenceNetwork class, implemented in [headroom/telemetry/toin.py](https://github.com/chopratejas/headroom/blob/main/headroom/telemetry/toin.py#L414-L428). This thread‑safe singleton stores all learned data and is accessed via the get_toin() factory function. It maintains a registry of ToolPattern objects, each representing a unique compression signature for a specific tool and tenant slice.

Aggregation Keys and Tenant Isolation

TOIN isolates learning per tenant and model family using a composite aggregation key generated by _make_pattern_key. The key structure is (auth_mode, model_family, tool_signature_hash), ensuring that compression strategies learned for one model family (e.g., GPT‑4o) do not leak into another (e.g., Claude), and that pay‑as‑you‑go tenants remain isolated from enterprise accounts.

Privacy‑First Design

The system guarantees privacy by storing only cryptographic hashes of field names, values, and query patterns. No raw user content, identifiers, or plaintext values are retained, as documented in the module docstring (lines 30‑35 of toin.py). This design allows TOIN to learn statistical patterns without exposing sensitive data.

How TOIN Learns Compression Patterns

TOIN implements a six‑stage feedback loop that replaces hard‑coded heuristics with empirical observation.

Stage 1: Recording Compression Events

When the SmartCrusher (located in headroom/transforms/smart_crusher.py) drops items from a tool response, it calls record_compression. This method:

  • Stores the tool’s structure_hash, original versus compressed item counts, and the compression ratio.
  • Updates strategy success counters for the specific compression algorithm used.
  • Invokes _update_field_statistics to hash field values and track unique value counts and most‑common values.

Stage 2: Detecting Retrievals

If the LLM later asks for the content that was dropped, the content router calls record_retrieval. This decreases the success score of the compression strategy that caused the drop and records which specific fields were retrieved, providing negative feedback for the learning algorithm.

Stage 3: Field‑Level Semantic Inference

The FieldSemantics class (defined in headroom/models.py) aggregates statistics about each field. After sufficient observations, infer_type() is triggered to classify fields into semantic categories such as error_indicator, identifier, or timestamp. These inferred types replace hard‑coded heuristics and determine which fields receive preserve‑field hints in future compressions.

Stage 4: Query Pattern Anonymization

TOIN learns which query patterns correlate with retrievals by stripping concrete values via _anonymize_query_pattern. For example, status:error AND user:john is normalized to status:* AND user:*, allowing the system to track that queries matching this abstract pattern frequently require full context, regardless of the specific user.

Stage 5: Aggregating Signals

Each ToolPattern maintains cumulative counters inside the singleton:

  • Total compressions and retrievals per strategy.
  • Field‑retrieval frequencies (which fields are most often recalled).
  • Query‑pattern frequencies.

These aggregates form the empirical basis for confidence scoring.

Stage 6: Producing Offline Recommendations

A periodic offline job runs via the CLI command python -m headroom.cli.toin_publish. This invokes _update_recommendations for every pattern accessible via iter_patterns. The logic computes:

  • Retrieval‑rate thresholds that determine when compression is too aggressive.
  • optimal_max_items based on the point of diminishing returns (where dropping more items triggers excessive retrievals).
  • preserve_fields lists derived from high‑retrieval frequency and semantic type inference.
  • The best compression strategy (e.g., smart_crusher vs truncate) with the highest success rate.

The results are written to recommendations.toml, which the Rust proxy consumes at startup.

Code Examples

Recording a Compression Event

from headroom.telemetry.toin import get_toin
from headroom.tools import ToolSignature

toin = get_toin()
toin.record_compression(
    tool_signature=ToolSignature(structure_hash="abc123…"),
    original_count=150,
    compressed_count=30,
    original_tokens=1200,
    compressed_tokens=200,
    strategy="smart_crusher",
    query_context="status:error AND user:john",
    items=[{"status": "error", "user": "john", "msg": "…"}],
    auth_mode="payg",
    model_family="gpt-4o",
)

This call updates the internal ToolPattern and emits a metric event (toin.compression).

Recording a Retrieval Event

toin.record_retrieval(
    tool_signature_hash="abc123…",
    retrieval_type="full",  # or "search"

    query="status:error AND user:john",
    query_fields=["status", "user"],
    strategy="smart_crusher",
    retrieved_items=[{"status": "error", "user": "john", "msg": "…"}],
    auth_mode="payg",
    model_family="gpt-4o",
)

This decreases the success rate for the smart_crusher strategy when applied to this pattern and enriches the field‑level semantics.

Generating Recommendations Offline

python -m headroom.cli.toin_publish --output recommendations.toml

The CLI iterates over every stored pattern and writes the TOML file that the Rust proxy reads on its next restart.

Integration with the Rust Proxy

The Rust proxy does not query TOIN at request time. Instead, it loads recommendations.toml at startup and applies the per‑tool hints (e.g., optimal_max_items, preserve_fields) when deciding which messages to drop. The legacy get_recommendation API is deprecated and now only emits a warning, ensuring that learning happens offline and inference happens via static configuration.

Summary

  • TOIN is an observation‑only learning layer that optimizes LLM context compression through empirical feedback rather than static rules.
  • It learns by recording compressions via record_compression and penalizing strategies when record_retrieval is called for dropped content.
  • Learning is isolated per tenant and model family using hashed aggregation keys like (auth_mode, model_family, tool_signature_hash).
  • Field‑level semantics and anonymized query patterns allow TOIN to infer which data is critical (e.g., error indicators) without storing raw values.
  • Recommendations are generated offline via toin_publish and consumed by the Rust proxy via recommendations.toml, ensuring zero per‑request latency overhead.

Frequently Asked Questions

What does TOIN stand for?

TOIN stands for Tool Output Intelligence Network. It is the specific subsystem within Headroom responsible for observing tool output compressions and retrievals to build optimized compression strategies.

How does TOIN protect user privacy while learning patterns?

TOIN implements a hash‑only storage policy. It stores cryptographic hashes of field names, values, and query patterns rather than the raw data, and it never retains user identifiers. This allows the system to count unique values and detect frequent patterns statistically without exposing sensitive content.

What triggers an update to a compression pattern?

A pattern updates when the feedback loop closes: first, when record_compression saves the initial drop statistics, and second, when record_retrieval indicates the LLM needed the dropped data. This retrieval event reduces the strategy’s success score and increments field‑retrieval counters, directly influencing the next generation of offline recommendations.

How do TOIN recommendations reach the Rust proxy?

Recommendations flow through an offline batch process. The headroom.cli.toin_publish command iterates over all stored patterns using iter_patterns, runs _update_recommendations to compute optimal settings, and writes them to recommendations.toml. The Rust proxy reads this file at startup; there is no runtime API for fetching hints during request processing.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →