How IntelligentContext Scores and Fits Content Based on Learned Importance

IntelligentContext evaluates every conversation turn using a weighted, multi-factor importance score—combining recency, semantic similarity, TOIN-learned patterns, error signals, forward references, and token density—then removes the lowest-scoring messages until the token budget is satisfied.

Headroom’s IntelligentContext is a message-level compression stage designed to preserve the most valuable context rather than naively truncating old turns. According to the chopratejas/headroom source code, this component sits at the end of the transformation pipeline, applying learned importance weighting to ensure critical information survives the compression process. The system references patterns from the Tool-Output Intelligence Network (TOIN) alongside real-time semantic analysis to make granular keep-or-drop decisions.

The Six-Factor Scoring Algorithm

The core logic resides in headroom/transforms/intelligent_context.py, where the IntelligentContextManager class calculates a composite score for each message. The algorithm evaluates six distinct factors defined in the ScoringWeights configuration structure found in headroom/config.py.

Scoring Factors

Each message receives a weighted score based on the following criteria:

  • Recency: The position of the message in the conversation history. Newer turns receive higher baseline scores under the assumption that recent context is typically more relevant.
  • Semantic similarity: Embedding-based cosine similarity to the upcoming user request. Messages semantically close to the current query are prioritized.
  • TOIN-learned importance: Statistics gathered by the Tool-Output Intelligence Network (TOIN) indicating which fields or message types are repeatedly retrieved or compressed across sessions. This factor allows the system to learn from historical usage patterns.
  • Error indicators: Presence of error-related tokens such as Exception or Traceback. Debugging context containing error details receives elevated importance to preserve diagnostic information.
  • Forward references: Citations or mentions of earlier messages in later dialogue turns. If a subsequent turn references historical information, the cited earlier message gains a higher score.
  • Token density: The ratio of informative tokens to filler content. Dense, information-rich messages are favored over verbose noise.

Weighted Calculation

The final score is a simple weighted sum of these factors. Users can tune the relative importance via the ScoringWeights dataclass, historically exposed through IntelligentContextConfig in headroom/config.py. For example, setting toin=3.0 and recency=1.0 emphasizes learned historical patterns over simple message age, as documented in wiki/transforms.md.

Implementation Architecture

Understanding where the logic lives helps when customizing or debugging the compression behavior.

Core Components

The primary implementation files include:

Pipeline Integration

As depicted in the README pipeline diagram, IntelligentContext operates as the final stage: CacheAligner → ContentRouter → SmartCrusher → IntelligentContext. The compress() entry point in the modern API automatically constructs this pipeline, applying score-based fitting without manual configuration.

Practical Usage Examples

Basic Compression (Current API)

The modern public API abstracts the scoring complexity behind the compress function:

from headroom import compress
from headroom.config import ModelConfig

messages = [
    {"role": "user", "content": "Explain recursion."},
    {"role": "assistant", "content": "Recursion is a process where a function calls itself..."},
    # ... additional turns ...

]

model_cfg = ModelConfig(model="gpt-4o")
compressed = compress(messages, model=model_cfg)

print(f"Original messages: {len(messages)}")
print(f"Compressed token count: {compressed.token_count}")

This invocation automatically builds the full pipeline and applies the multi-factor scoring algorithm described in wiki/transforms.md.

Advanced Explicit Configuration (Pre-0.9.x)

For versions prior to 0.9.x or when explicit control is required, instantiate the manager directly:

from headroom.config import IntelligentContextConfig, ScoringWeights
from headroom.transforms import IntelligentContextManager
from headroom.toin import TOIN

weights = ScoringWeights(
    recency=1.0,
    similarity=2.0,
    toin=3.0,
    error=1.5,
    forward_ref=1.0,
    density=0.5,
)

ic_cfg = IntelligentContextConfig(
    max_tokens=8192,
    scoring_weights=weights,
)

toin = TOIN()
manager = IntelligentContextManager(config=ic_cfg, toin=toin)
result = manager.apply(messages, tokenizer)

The result object contains the compressed context plus metadata about which messages were dropped to the CCR (Compressed Context Repository).

Retrieving Dropped Content

Messages removed by IntelligentContext are stored reversibly. If the LLM requires a dropped message, it can request retrieval via the headroom_retrieve tool:

from headroom import retrieve

# In response to a tool call: headroom_retrieve(hash="abc123")

original_content = retrieve(hash="abc123")
print(original_content["content"])

This mechanism, documented in docs/content/docs/architecture.mdx, ensures no information is permanently lost during compression.

Automatic Fallback Behavior

When scoring data is unavailable—such as missing embeddings or uninitialized TOIN statistics—Headroom gracefully degrades to the RollingWindow strategy. This fallback simply retains the most recent tokens up to the budget limit, ensuring the request completes successfully even when the learned-importance path cannot be exercised. As noted in docs/content/docs/context-management.mdx, this fallback is automatic and requires no user intervention.

Summary

  • IntelligentContext uses a six-factor weighted scoring system (recency, semantic similarity, TOIN patterns, errors, forward references, and token density) to determine message importance.
  • The scoring logic is implemented in headroom/transforms/intelligent_context.py and configured via ScoringWeights in headroom/config.py.
  • Messages are sorted by composite score and the lowest-scoring items are dropped until the token budget is satisfied.
  • Dropped content is preserved in the CCR and can be retrieved later via the headroom_retrieve tool.
  • The system automatically falls back to RollingWindow truncation when intelligent scoring data is unavailable.

Frequently Asked Questions

How does TOIN influence the importance scoring?

The Tool-Output Intelligence Network (TOIN) collects statistics across sessions about which message fields are frequently retrieved or compressed. It assigns higher importance scores to message types that historically prove valuable, allowing IntelligentContext to learn and prioritize context that matters based on actual usage patterns rather than static heuristics.

What happens to messages that are removed during compression?

Removed messages are not deleted but stored in the CCR (Compressed Context Repository), a reversible compression cache. If the LLM later determines it needs a dropped message, it can emit a headroom_retrieve tool call with the message hash to fetch the original content from the CCR, as implemented in the retrieval pipeline.

Can I disable the learned-importance scoring and use simple truncation?

Yes. If TOIN data or embeddings are unavailable, or if IntelligentContext is explicitly disabled, Headroom automatically falls back to the RollingWindow strategy. This mode preserves only the most recent tokens up to the model limit without applying importance scoring, ensuring compatibility when advanced features are not configured.

Where are the scoring weights defined and how can they be customized?

Scoring weights are defined in the ScoringWeights dataclass located in headroom/config.py. In pre-0.9.x versions, you could pass custom weights via IntelligentContextConfig. In current versions, the system uses optimized defaults, though the underlying weight constants remain configurable through the configuration layer for advanced deployments.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →