deep-dive

How Headroom's Cross-Agent Memory Works with SharedContext: A Technical Deep Dive

June 5, 2026 chopratejas/headroom ↗

Headroom's cross-agent memory uses the SharedContext class to provide a thread-safe, TTL-aware cache that automatically compresses agent outputs, allowing subsequent agents to retrieve compressed summaries while preserving the original data for on-demand access.

The Headroom library (chopratejas/headroom) solves the problem of passing large payloads between different AI agents by providing a SharedContext abstraction that acts as a process-wide memory store. This cross-agent memory system enables CrewAI, LangGraph, and OpenAI Agents SDK workflows to share state efficiently without redundant network calls or token-wasting repetitions.

Core Architecture of Headroom's Cross-Agent Memory

The cross-agent memory implementation centers on four key components that work together in headroom/shared_context.py to provide safe, compressed data sharing.

The SharedContext Class

The SharedContext class serves as the public API that agents import via from headroom import SharedContext. As implemented in lines 67-89 of headroom/shared_context.py, it holds an in-memory dictionary of ContextEntry objects and protects concurrent access using a threading.Lock. The constructor accepts ttl (default 3600 seconds) and max_entries parameters to enforce memory limits:

from headroom import SharedContext

# Initialize with 30-minute TTL and 50-entry limit

ctx = SharedContext(ttl=1800, max_entries=50)

ContextEntry Dataclass

Each stored item becomes a ContextEntry (lines 36-55), a dataclass that tracks both the original and compressed text. It stores token counts, timestamps, the agent identifier, and a list of transforms applied during compression. This enables per-entry statistics like savings_percent and provides full traceability for debugging multi-agent workflows.

Compression Pipeline Integration

When SharedContext.put() is called, it invokes headroom.compress.compress() (lines 98-112) to reduce payload size before storage. This uses the same CCR (Compress-Cache-Retrieve) stack as the Headroom proxy, applying SmartCrusher for JSON, CodeCompressor for code blocks, and Kompress for plain text. Every stored item undergoes consistent compression heuristics regardless of which agent originates the data.

TTL and Eviction Strategy

To prevent unbounded memory growth, entries expire after the configured ttl seconds. When max_entries is reached, the system evicts the oldest entry via _evict_if_needed() (lines 120-129 and 130-138). Stale context automatically disappears on the next get or get_entry call, ensuring workflow isolation.

Thread-Safety Mechanisms

All mutating operations—put, clear, and eviction—acquire the same Lock. Read operations like get, keys, and stats also lock long enough to snapshot the entry, making the object safe to share between concurrent agents in the same Python process (lines 90-98).

Cross-Agent Memory Workflow

The typical lifecycle of data passing through Headroom's shared memory follows four distinct steps:

Agent A stores output using ctx.put(key, data, agent="name"), which compresses the payload and creates a ContextEntry in the internal _entries map.
Agent B retrieves the compressed version via ctx.get(key), receiving the compressed text by default to save context window space.
Agent B requests original data when needed by passing full=True to ctx.get(key, full=True), fetching the uncompressed original without recomputation.
Monitoring and cleanup occur through ctx.stats(), which aggregates token counts across entries, while automatic expiration and eviction handle housekeeping.

Practical Implementation Examples

Basic Usage Pattern

from headroom import SharedContext

# 1️⃣ Initialize (one per process)

ctx = SharedContext(ttl=1800, max_entries=50)

# 2️⃣ Store large JSON output

large_json = '{"items": [...]}'  # Imagine 10,000 tokens

entry = ctx.put("search_results", large_json, agent="searcher")
print(f"Compression saved {entry.savings_percent}%")  # → e.g., 85.0

# 3️⃣ Retrieve compressed summary for next agent

summary = ctx.get("search_results")  # ~1,500 tokens

# 4️⃣ Access full original when detailed analysis needed

full_data = ctx.get("search_results", full=True)

# 5️⃣ Inspect metadata

meta = ctx.get_entry("search_results")
print(meta.transforms)  # ['smart_crusher', 'kompress']

# 6️⃣ Workflow cleanup

print(ctx.stats())  # Aggregated savings across all entries

ctx.clear()

Integration with CrewAI


# After research task completes

ctx.put("findings", researcher_task.output.raw, agent="researcher")

# Coding agent receives compressed context

coder_context = ctx.get("findings")

Integration with LangGraph

def researcher_node(state):
    result = do_research()
    ctx.put("research", result)
    # Pass compressed version to next node

    return {"research_summary": ctx.get("research")}

Integration with OpenAI Agents SDK

def compress_handoff(messages):
    for msg in messages:
        if len(msg.content) > 1000:
            ctx.put(msg.id, msg.content)
            msg.content = ctx.get(msg.id)  # Replace with compressed version

    return messages

Summary

SharedContext provides a process-wide memory store that any agent can access via put() and get() methods without network overhead.
Automatic compression via the CCR stack (SmartCrusher, CodeCompressor, Kompress) reduces token counts by 60-90% while preserving originals for full=True retrieval.
Thread-safe implementation using threading.Lock allows concurrent access from multiple agents in the same Python process.
TTL and eviction policies prevent memory leaks by removing stale entries after the configured timeout or when capacity limits are reached.
Per-entry metadata through ContextEntry enables detailed tracing of compression transforms and token savings statistics.

Frequently Asked Questions

What is SharedContext in Headroom and why does it matter for multi-agent systems?

SharedContext is a pure-Python utility class in headroom/shared_context.py that acts as a cross-agent memory layer. It matters because it eliminates the need for agents to repeatedly pass large payloads through message queues or LLM context windows, instead storing compressed versions that subsequent agents can access instantly via shared memory.

How does the compression work when storing data in SharedContext?

When you call SharedContext.put(), the method immediately invokes headroom.compress.compress() using the same pipeline as the Headroom proxy. This applies heuristics like SmartCrusher for JSON structures, CodeCompressor for programming syntax, and Kompress for natural text, storing both the original and compressed variants in a ContextEntry object.

Is SharedContext safe to use with concurrent agents?

Yes. The implementation uses a threading.Lock to protect all write operations and snapshot reads, making it safe to share a single SharedContext instance across threads. According to the source code in lines 90-98 of headroom/shared_context.py, both mutating operations (put, clear) and reads (get, stats) acquire the lock to prevent race conditions.

Can I retrieve the original uncompressed data after storage?

Absolutely. While ctx.get(key) returns the compressed version by default to save context space, passing full=True as in ctx.get(key, full=True) returns the exact original text stored in the ContextEntry.original field without any recompression or network calls.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how chopratejas/headroom works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →