How Headroom's Cross-Agent Memory Works with SharedContext: A Technical Deep Dive
Headroom's cross-agent memory uses the SharedContext class to provide a thread-safe, TTL-aware cache that automatically compresses agent outputs, allowing subsequent agents to retrieve compressed summaries while preserving the original data for on-demand access.
The Headroom library (chopratejas/headroom) solves the problem of passing large payloads between different AI agents by providing a SharedContext abstraction that acts as a process-wide memory store. This cross-agent memory system enables CrewAI, LangGraph, and OpenAI Agents SDK workflows to share state efficiently without redundant network calls or token-wasting repetitions.
Core Architecture of Headroom's Cross-Agent Memory
The cross-agent memory implementation centers on four key components that work together in headroom/shared_context.py to provide safe, compressed data sharing.
The SharedContext Class
The SharedContext class serves as the public API that agents import via from headroom import SharedContext. As implemented in lines 67-89 of headroom/shared_context.py, it holds an in-memory dictionary of ContextEntry objects and protects concurrent access using a threading.Lock. The constructor accepts ttl (default 3600 seconds) and max_entries parameters to enforce memory limits:
from headroom import SharedContext
# Initialize with 30-minute TTL and 50-entry limit
ctx = SharedContext(ttl=1800, max_entries=50)
ContextEntry Dataclass
Each stored item becomes a ContextEntry (lines 36-55), a dataclass that tracks both the original and compressed text. It stores token counts, timestamps, the agent identifier, and a list of transforms applied during compression. This enables per-entry statistics like savings_percent and provides full traceability for debugging multi-agent workflows.
Compression Pipeline Integration
When SharedContext.put() is called, it invokes headroom.compress.compress() (lines 98-112) to reduce payload size before storage. This uses the same CCR (Compress-Cache-Retrieve) stack as the Headroom proxy, applying SmartCrusher for JSON, CodeCompressor for code blocks, and Kompress for plain text. Every stored item undergoes consistent compression heuristics regardless of which agent originates the data.
TTL and Eviction Strategy
To prevent unbounded memory growth, entries expire after the configured ttl seconds. When max_entries is reached, the system evicts the oldest entry via _evict_if_needed() (lines 120-129 and 130-138). Stale context automatically disappears on the next get or get_entry call, ensuring workflow isolation.
Thread-Safety Mechanisms
All mutating operations—put, clear, and eviction—acquire the same Lock. Read operations like get, keys, and stats also lock long enough to snapshot the entry, making the object safe to share between concurrent agents in the same Python process (lines 90-98).
Cross-Agent Memory Workflow
The typical lifecycle of data passing through Headroom's shared memory follows four distinct steps:
-
Agent A stores output using
ctx.put(key, data, agent="name"), which compresses the payload and creates aContextEntryin the internal_entriesmap. -
Agent B retrieves the compressed version via
ctx.get(key), receiving the compressed text by default to save context window space. -
Agent B requests original data when needed by passing
full=Truetoctx.get(key, full=True), fetching the uncompressed original without recomputation. -
Monitoring and cleanup occur through
ctx.stats(), which aggregates token counts across entries, while automatic expiration and eviction handle housekeeping.
Practical Implementation Examples
Basic Usage Pattern
from headroom import SharedContext
# 1️⃣ Initialize (one per process)
ctx = SharedContext(ttl=1800, max_entries=50)
# 2️⃣ Store large JSON output
large_json = '{"items": [...]}' # Imagine 10,000 tokens
entry = ctx.put("search_results", large_json, agent="searcher")
print(f"Compression saved {entry.savings_percent}%") # → e.g., 85.0
# 3️⃣ Retrieve compressed summary for next agent
summary = ctx.get("search_results") # ~1,500 tokens
# 4️⃣ Access full original when detailed analysis needed
full_data = ctx.get("search_results", full=True)
# 5️⃣ Inspect metadata
meta = ctx.get_entry("search_results")
print(meta.transforms) # ['smart_crusher', 'kompress']
# 6️⃣ Workflow cleanup
print(ctx.stats()) # Aggregated savings across all entries
ctx.clear()
Integration with CrewAI
# After research task completes
ctx.put("findings", researcher_task.output.raw, agent="researcher")
# Coding agent receives compressed context
coder_context = ctx.get("findings")
Integration with LangGraph
def researcher_node(state):
result = do_research()
ctx.put("research", result)
# Pass compressed version to next node
return {"research_summary": ctx.get("research")}
Integration with OpenAI Agents SDK
def compress_handoff(messages):
for msg in messages:
if len(msg.content) > 1000:
ctx.put(msg.id, msg.content)
msg.content = ctx.get(msg.id) # Replace with compressed version
return messages
Summary
- SharedContext provides a process-wide memory store that any agent can access via
put()andget()methods without network overhead. - Automatic compression via the CCR stack (SmartCrusher, CodeCompressor, Kompress) reduces token counts by 60-90% while preserving originals for
full=Trueretrieval. - Thread-safe implementation using
threading.Lockallows concurrent access from multiple agents in the same Python process. - TTL and eviction policies prevent memory leaks by removing stale entries after the configured timeout or when capacity limits are reached.
- Per-entry metadata through ContextEntry enables detailed tracing of compression transforms and token savings statistics.
Frequently Asked Questions
What is SharedContext in Headroom and why does it matter for multi-agent systems?
SharedContext is a pure-Python utility class in headroom/shared_context.py that acts as a cross-agent memory layer. It matters because it eliminates the need for agents to repeatedly pass large payloads through message queues or LLM context windows, instead storing compressed versions that subsequent agents can access instantly via shared memory.
How does the compression work when storing data in SharedContext?
When you call SharedContext.put(), the method immediately invokes headroom.compress.compress() using the same pipeline as the Headroom proxy. This applies heuristics like SmartCrusher for JSON structures, CodeCompressor for programming syntax, and Kompress for natural text, storing both the original and compressed variants in a ContextEntry object.
Is SharedContext safe to use with concurrent agents?
Yes. The implementation uses a threading.Lock to protect all write operations and snapshot reads, making it safe to share a single SharedContext instance across threads. According to the source code in lines 90-98 of headroom/shared_context.py, both mutating operations (put, clear) and reads (get, stats) acquire the lock to prevent race conditions.
Can I retrieve the original uncompressed data after storage?
Absolutely. While ctx.get(key) returns the compressed version by default to save context space, passing full=True as in ctx.get(key, full=True) returns the exact original text stored in the ContextEntry.original field without any recompression or network calls.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →