TOIN (Tool Output Intelligent Network) in Headroom: Learning Compression Patterns Explained
TOIN (Tool Output Intelligent Network) is a privacy-preserving, observation-only telemetry subsystem that records compression tool behavior across user requests to learn statistical patterns and generate offline recommendations for optimizing future compression strategies.
Headroom is an open-source proxy system designed to optimize LLM tool outputs through intelligent compression. At its core sits TOIN (Tool Output Intelligent Network), a sophisticated learning engine that analyzes real-world usage without compromising request-time performance or user privacy. This system continuously observes how compression tools behave across different contexts to refine and improve compression decisions over time.
What is TOIN?
TOIN stands for Tool Output Intelligent Network, though the codebase occasionally refers to it as Tool Output Intelligence Network. It functions as the learning layer of the Headroom architecture, sitting alongside compression tools like SmartCrusher to build a feedback loop between compression actions and retrieval outcomes.
Unlike systems that modify behavior in real-time, TOIN operates under a strict observation-only contract. During request processing, calls such as record_compression only log metadata about what happened; they never alter the compression path or inject latency. The actual learning happens offline through the headroom.cli.toin_publish command, which emits a recommendations.toml file that the Rust proxy reads at startup.
How TOIN Learns Compression Patterns
The learning mechanism relies on aggregating events by a specific tuple key defined in the _make_pattern_key helper within headroom/telemetry/toin.py. Patterns are keyed by (auth_mode, model_family, tool_signature_hash), enabling each tenant-slice and model family to learn independently.
The learning loop consists of three primary operations:
record_compression– Updates per-tool statistics including compression counts, token ratios, and strategy success ratesrecord_retrieval– Signals that compression was too aggressive (requiring re-fetch), updates retrieval counters, and enriches field-level semantics_update_recommendations– Periodically derives optimal max-items, skip-compression flags, preserve-fields lists, and the best compression strategy based on accumulated statistics
Confidence in recommendations grows with sample size and the number of distinct users tracked, calculated via the _calculate_confidence method.
Core Architecture and Data Flow
The TOIN implementation spans multiple files in the headroom/telemetry/ module:
headroom/telemetry/toin.py– Core implementation containing recording methods, aggregation logic, and serializationheadroom/telemetry/models.py– Data structures includingToolSignatureandFieldSemanticsheadroom/telemetry/backends.py– Storage backends; the defaultFileSystemTOINBackendwrites totoin.jsonheadroom/cli/toin_publish.py– Offline CLI that loads persisted JSON, aggregates patterns, and writesrecommendations.toml
The data flow follows an offline-first architecture. Compression events and retrieval feedback accumulate in the JSON store through the request-time API. Operators periodically run the publish command to generate TOML recommendations. The proxy loads these recommendations at startup, ensuring zero per-request overhead from the learning system.
Privacy and Scalability Safeguards
TOIN implements strict privacy controls by design. The system never stores actual data values—only structure hashes, field-name hashes, and anonymized query patterns. This ensures that sensitive user information remains within the operator's infrastructure while still allowing pattern recognition.
Scalability safeguards prevent unbounded growth:
- Instance hash caps – Limited to 100 stored instance hashes per pattern
- Duplicate detection bounds –
MAX_SEEN_INSTANCES = 10,000provides a hard limit for the deduplication set - Frequency-based pruning – Query-pattern and field-retrieval dictionaries are pruned based on usage frequency
Operators can integrate external monitoring through a configurable metrics_callback that emits events like toin.compression and toin.retrieval to Prometheus or similar systems.
Implementation Examples
Recording Compression Events
The record_compression method in headroom/telemetry/toin.py (lines 521-602) captures request-time metadata without blocking the compression pipeline:
from headroom.telemetry.toin import get_toin
from headroom.transforms.smart_crusher import ToolSignature
# Assuming `signature` is a ToolSignature instance
get_toin().record_compression(
tool_signature=signature,
original_count=len(items),
compressed_count=kept,
original_tokens=before,
compressed_tokens=after,
strategy="smart_crusher",
query_context="status:error AND user:alice",
items=items,
auth_mode="payg",
model_family="claude-3-5",
)
Recording Retrieval Feedback
When compression proves too aggressive and requires re-fetching, record_retrieval (lines 788-877 in headroom/telemetry/toin.py) updates the statistical model:
from headroom.telemetry.toin import get_toin
get_toin().record_retrieval(
tool_signature_hash=signature.structure_hash,
retrieval_type="full",
query="status:error",
query_fields=["status"],
strategy="smart_crusher",
retrieved_items=retrieved,
auth_mode="payg",
model_family="claude-3-5",
)
Accessing Aggregated Statistics
Monitor system-wide performance through the get_stats method (around line 1220 in headroom/telemetry/toin.py):
stats = get_toin().get_stats()
print(stats["global_retrieval_rate"])
Exporting Patterns for Offline Processing
Export accumulated patterns for the CLI publisher:
from headroom.telemetry.toin import get_toin
patterns = get_toin().iter_patterns() # Returns list of (key, ToolPattern) tuples
Summary
- TOIN (Tool Output Intelligent Network) is Headroom's privacy-preserving telemetry layer that learns optimal compression strategies through observation rather than real-time mutation
- The system aggregates patterns by
(auth_mode, model_family, tool_signature_hash), enabling isolated learning across tenant and model boundaries - Observation-only contract ensures zero latency impact on request-time compression paths; recommendations are generated offline via
headroom.cli.toin_publish - Privacy safeguards store only hashes and anonymized patterns, never actual data values
- Scalability limits including
MAX_SEEN_INSTANCES = 10,000and frequency-based pruning prevent resource exhaustion
Frequently Asked Questions
How does TOIN protect user privacy?
TOIN never persists actual data values, field contents, or raw queries. Instead, it stores structure hashes, field-name hashes, and anonymized query patterns. This design ensures that sensitive information remains within your infrastructure while still allowing the system to learn which fields are frequently retrieved and which compression strategies work best for specific tool signatures.
What is the observation-only contract?
The observation-only contract means that TOIN methods like record_compression and record_retrieval operate as pure side-effects—they log telemetry data without modifying the compression decision path or adding latency to the request. The actual learning and recommendation generation happens offline through the toin_publish CLI, which writes a recommendations.toml file consumed by the proxy at startup.
How are compression recommendations generated?
Recommendations are derived through the _update_recommendations method, which analyzes aggregated statistics from toin.json to calculate confidence scores based on sample size and user diversity. The system determines optimal max-items thresholds, identifies fields that should be preserved, flags situations where compression should be skipped entirely, and selects the best compression strategy for each tool signature pattern.
What are the scalability limits of TOIN?
TOIN enforces hard caps to prevent unbounded memory growth: it stores a maximum of 100 instance hashes per pattern and maintains a deduplication set limited to MAX_SEEN_INSTANCES = 10,000. Additionally, frequency-based pruning removes rarely-used query patterns and field-retrieval entries, ensuring the telemetry store remains bounded even under high-volume production loads.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →