How Cross-Agent Memory Works with SharedContext in Headroom

SharedContext enables multiple agents in the Headroom framework to share large data payloads through a thread-safe, compressed in-memory store that reduces token usage by approximately 80% while preserving original content for on-demand retrieval.

The chopratejas/headroom repository implements cross-agent memory through a centralized SharedContext class that eliminates the need to repeatedly transmit full data payloads between agents. By leveraging Headroom’s existing compression pipeline, this mechanism allows agents operating within the same process to exchange efficiently compressed representations of research results, tool outputs, and conversation history while maintaining access to the original uncompressed data.

Compressed Storage Architecture

When an agent generates large outputs—such as research results, tool dumps, or multi-turn conversations—it stores them using SharedContext.put(key, content). This method forwards the raw text to Headroom’s single-function compression API (headroom.compress.compress).

The compression pipeline executes the same transforms used by the proxy for request compression: CacheAligner → ContentRouter → SmartCrusher / CodeCompressor / Kompress. As implemented in headroom/shared_context.py, the process generates a ContextEntry dataclass containing the original text, its compressed form, token counts before and after compression, and the list of applied transforms.

Thread-Safe, Process-Shared Storage

All agents importing the same SharedContext instance access a unified in-memory store protected by threading.Lock. The internal _entries dictionary ensures that context can be shared across threads and across agent hops within the same process without race conditions.

The implementation enforces TTL (time-to-live) and entry limits to prevent unbounded growth. Each entry receives a timestamp, and the get and keys methods filter out entries exceeding the configured ttl (defaulting to 1 hour). When the maximum entry count (max_entries) is reached, the oldest entry is evicted automatically, as defined in headroom/shared_context.py.

Retrieving Context Across Agents

Accessing Compressed vs. Full Content

The SharedContext.get(key, full=False) method returns the compressed version by default, which downstream agents typically consume. This compressed payload is approximately 80% smaller than the original, making it ideal for efficient transmission between agents.

When an agent requires detailed inspection of specific data points, setting full=True returns the original uncompressed text. This "zoom in" capability allows agents to balance efficiency with precision when working with shared memory.

Monitoring Compression Efficiency

The SharedContext.stats() method aggregates token savings across all active entries, exposing:

  • Number of active entries
  • Total original and compressed token counts
  • Total tokens saved
  • Overall savings percentage

This introspection capability, implemented in headroom/shared_context.py, makes it easy to monitor the memory-compression efficiency of cross-agent workflows in real-time.

Practical Implementation Examples

Basic Storage and Retrieval

from headroom import SharedContext

# Create a shared context (usually a singleton per process)

ctx = SharedContext()

# Agent A stores a large research result

large_output = """... a very long markdown or JSON ..."""
entry = ctx.put("research", large_output, agent="agent_A")
print(f"Saved {entry.savings_percent}% tokens")

# Agent B later obtains the compressed version

compressed = ctx.get("research")
print("Compressed payload:", compressed[:200], "...")

# Agent B wants the full text for a deep dive

full_text = ctx.get("research", full=True)
print("Full text length:", len(full_text))

# Inspect statistics

stats = ctx.stats()
print(f"Overall savings: {stats.savings_percent}% across {stats.entries} entries")

Integration with Proxy Handlers

from headroom.shared_context import SharedContext
from headroom.proxy.handlers.openai import OpenAIHandler

shared_ctx = SharedContext()

class MyOpenAIHandler(OpenAIHandler):
    async def handle(self, request):
        # Before sending request, store the tool output

        tool_output = request["messages"][-1]["content"]
        shared_ctx.put("latest_tool_output", tool_output, agent="openai_handler")

        # Retrieve compressed context for the next turn

        compressed = shared_ctx.get("latest_tool_output")
        request["messages"].append({"role": "assistant", "content": compressed})

        return await super().handle(request)

Summary

  • SharedContext provides thread-safe, compressed storage for cross-agent memory in Headroom, located in headroom/shared_context.py
  • The put() method automatically compresses content using the pipeline defined in headroom/compress.py, reducing token counts by approximately 80%
  • Entries default to a 1-hour TTL with automatic eviction when max_entries is exceeded, preventing memory exhaustion
  • Agents retrieve compressed payloads by default via get(), with full=True providing access to the original uncompressed text
  • The stats() method tracks aggregate token savings across all shared entries for monitoring compression efficiency

Frequently Asked Questions

What is SharedContext in Headroom?

SharedContext is a centralized memory store that enables cross-agent communication within the Headroom framework. It allows multiple agents running in the same process to share large data payloads through compressed representations, reducing token transmission costs while maintaining thread safety via threading.Lock.

How does the compression pipeline in SharedContext work?

When storing data via put(), SharedContext invokes headroom.compress.compress, which executes a multi-stage pipeline: CacheAligner → ContentRouter → SmartCrusher / CodeCompressor / Kompress. This pipeline removes redundant tokens and optimizes the payload structure, typically achieving approximately 80% size reduction while preserving semantic content.

How long does data persist in SharedContext?

By default, entries expire after 1 hour (configurable via the ttl parameter). Additionally, when the store reaches max_entries, the oldest entry is automatically evicted. The get and keys methods automatically filter out expired entries, ensuring agents only access valid context.

Is SharedContext safe for concurrent multi-agent access?

Yes. According to the source code in headroom/shared_context.py, the internal _entries dictionary is protected by threading.Lock, making all read and write operations thread-safe. This allows multiple agents to simultaneously access and update shared context without race conditions or data corruption.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →