deep-dive

TOIN (Tool Output Intelligent Network) in Headroom: Learning Compression Patterns Explained

June 7, 2026 chopratejas/headroom ↗

TOIN (Tool Output Intelligent Network) is a privacy-preserving, observation-only telemetry subsystem that records compression tool behavior across user requests to learn statistical patterns and generate offline recommendations for optimizing future compression strategies.

Headroom is an open-source proxy system designed to optimize LLM tool outputs through intelligent compression. At its core sits TOIN (Tool Output Intelligent Network), a sophisticated learning engine that analyzes real-world usage without compromising request-time performance or user privacy. This system continuously observes how compression tools behave across different contexts to refine and improve compression decisions over time.

What is TOIN?

TOIN stands for Tool Output Intelligent Network, though the codebase occasionally refers to it as Tool Output Intelligence Network. It functions as the learning layer of the Headroom architecture, sitting alongside compression tools like SmartCrusher to build a feedback loop between compression actions and retrieval outcomes.

Unlike systems that modify behavior in real-time, TOIN operates under a strict observation-only contract. During request processing, calls such as record_compression only log metadata about what happened; they never alter the compression path or inject latency. The actual learning happens offline through the headroom.cli.toin_publish command, which emits a recommendations.toml file that the Rust proxy reads at startup.

How TOIN Learns Compression Patterns

The learning mechanism relies on aggregating events by a specific tuple key defined in the _make_pattern_key helper within headroom/telemetry/toin.py. Patterns are keyed by (auth_mode, model_family, tool_signature_hash), enabling each tenant-slice and model family to learn independently.

The learning loop consists of three primary operations:

record_compression – Updates per-tool statistics including compression counts, token ratios, and strategy success rates
record_retrieval – Signals that compression was too aggressive (requiring re-fetch), updates retrieval counters, and enriches field-level semantics
_update_recommendations – Periodically derives optimal max-items, skip-compression flags, preserve-fields lists, and the best compression strategy based on accumulated statistics

Confidence in recommendations grows with sample size and the number of distinct users tracked, calculated via the _calculate_confidence method.

Core Architecture and Data Flow

The TOIN implementation spans multiple files in the headroom/telemetry/ module:

headroom/telemetry/toin.py – Core implementation containing recording methods, aggregation logic, and serialization
headroom/telemetry/models.py – Data structures including ToolSignature and FieldSemantics
headroom/telemetry/backends.py – Storage backends; the default FileSystemTOINBackend writes to toin.json
headroom/cli/toin_publish.py – Offline CLI that loads persisted JSON, aggregates patterns, and writes recommendations.toml

The data flow follows an offline-first architecture. Compression events and retrieval feedback accumulate in the JSON store through the request-time API. Operators periodically run the publish command to generate TOML recommendations. The proxy loads these recommendations at startup, ensuring zero per-request overhead from the learning system.

Privacy and Scalability Safeguards

TOIN implements strict privacy controls by design. The system never stores actual data values—only structure hashes, field-name hashes, and anonymized query patterns. This ensures that sensitive user information remains within the operator's infrastructure while still allowing pattern recognition.

Scalability safeguards prevent unbounded growth:

Instance hash caps – Limited to 100 stored instance hashes per pattern
Duplicate detection bounds – MAX_SEEN_INSTANCES = 10,000 provides a hard limit for the deduplication set
Frequency-based pruning – Query-pattern and field-retrieval dictionaries are pruned based on usage frequency

Operators can integrate external monitoring through a configurable metrics_callback that emits events like toin.compression and toin.retrieval to Prometheus or similar systems.

Implementation Examples

Recording Compression Events

The record_compression method in headroom/telemetry/toin.py (lines 521-602) captures request-time metadata without blocking the compression pipeline:

from headroom.telemetry.toin import get_toin
from headroom.transforms.smart_crusher import ToolSignature

# Assuming `signature` is a ToolSignature instance

get_toin().record_compression(
    tool_signature=signature,
    original_count=len(items),
    compressed_count=kept,
    original_tokens=before,
    compressed_tokens=after,
    strategy="smart_crusher",
    query_context="status:error AND user:alice",
    items=items,
    auth_mode="payg",
    model_family="claude-3-5",
)

Recording Retrieval Feedback

When compression proves too aggressive and requires re-fetching, record_retrieval (lines 788-877 in headroom/telemetry/toin.py) updates the statistical model:

from headroom.telemetry.toin import get_toin

get_toin().record_retrieval(
    tool_signature_hash=signature.structure_hash,
    retrieval_type="full",
    query="status:error",
    query_fields=["status"],
    strategy="smart_crusher",
    retrieved_items=retrieved,
    auth_mode="payg",
    model_family="claude-3-5",
)

Accessing Aggregated Statistics

Monitor system-wide performance through the get_stats method (around line 1220 in headroom/telemetry/toin.py):

stats = get_toin().get_stats()
print(stats["global_retrieval_rate"])

Exporting Patterns for Offline Processing

Export accumulated patterns for the CLI publisher:

from headroom.telemetry.toin import get_toin

patterns = get_toin().iter_patterns()  # Returns list of (key, ToolPattern) tuples

Summary

TOIN (Tool Output Intelligent Network) is Headroom's privacy-preserving telemetry layer that learns optimal compression strategies through observation rather than real-time mutation
The system aggregates patterns by (auth_mode, model_family, tool_signature_hash), enabling isolated learning across tenant and model boundaries
Observation-only contract ensures zero latency impact on request-time compression paths; recommendations are generated offline via headroom.cli.toin_publish
Privacy safeguards store only hashes and anonymized patterns, never actual data values
Scalability limits including MAX_SEEN_INSTANCES = 10,000 and frequency-based pruning prevent resource exhaustion

Frequently Asked Questions

How does TOIN protect user privacy?

TOIN never persists actual data values, field contents, or raw queries. Instead, it stores structure hashes, field-name hashes, and anonymized query patterns. This design ensures that sensitive information remains within your infrastructure while still allowing the system to learn which fields are frequently retrieved and which compression strategies work best for specific tool signatures.

What is the observation-only contract?

The observation-only contract means that TOIN methods like record_compression and record_retrieval operate as pure side-effects—they log telemetry data without modifying the compression decision path or adding latency to the request. The actual learning and recommendation generation happens offline through the toin_publish CLI, which writes a recommendations.toml file consumed by the proxy at startup.

How are compression recommendations generated?

Recommendations are derived through the _update_recommendations method, which analyzes aggregated statistics from toin.json to calculate confidence scores based on sample size and user diversity. The system determines optimal max-items thresholds, identifies fields that should be preserved, flags situations where compression should be skipped entirely, and selects the best compression strategy for each tool signature pattern.

What are the scalability limits of TOIN?

TOIN enforces hard caps to prevent unbounded memory growth: it stores a maximum of 100 instance hashes per pattern and maintains a deduplication set limited to MAX_SEEN_INSTANCES = 10,000. Additionally, frequency-based pruning removes rarely-used query patterns and field-retrieval entries, ensuring the telemetry store remains bounded even under high-volume production loads.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how chopratejas/headroom works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →