# How Cross-Agent Memory Works with SharedContext in Headroom

> Discover how cross-agent memory works with SharedContext in Headroom. This feature efficiently shares large data, cuts token usage by 80%, and maintains content for retrieval.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: how-to-guide
- Published: 2026-06-10

---

**SharedContext enables multiple agents in the Headroom framework to share large data payloads through a thread-safe, compressed in-memory store that reduces token usage by approximately 80% while preserving original content for on-demand retrieval.**

The `chopratejas/headroom` repository implements cross-agent memory through a centralized `SharedContext` class that eliminates the need to repeatedly transmit full data payloads between agents. By leveraging Headroom’s existing compression pipeline, this mechanism allows agents operating within the same process to exchange efficiently compressed representations of research results, tool outputs, and conversation history while maintaining access to the original uncompressed data.

## Compressed Storage Architecture

When an agent generates large outputs—such as research results, tool dumps, or multi-turn conversations—it stores them using `SharedContext.put(key, content)`. This method forwards the raw text to Headroom’s single-function compression API (`headroom.compress.compress`).

The compression pipeline executes the same transforms used by the proxy for request compression: **CacheAligner → ContentRouter → SmartCrusher / CodeCompressor / Kompress**. As implemented in [`headroom/shared_context.py`](https://github.com/chopratejas/headroom/blob/main/headroom/shared_context.py), the process generates a `ContextEntry` dataclass containing the original text, its compressed form, token counts before and after compression, and the list of applied transforms.

## Thread-Safe, Process-Shared Storage

All agents importing the same `SharedContext` instance access a unified in-memory store protected by `threading.Lock`. The internal `_entries` dictionary ensures that context can be shared across threads and across agent hops within the same process without race conditions.

The implementation enforces **TTL (time-to-live)** and entry limits to prevent unbounded growth. Each entry receives a timestamp, and the `get` and `keys` methods filter out entries exceeding the configured `ttl` (defaulting to 1 hour). When the maximum entry count (`max_entries`) is reached, the oldest entry is evicted automatically, as defined in [`headroom/shared_context.py`](https://github.com/chopratejas/headroom/blob/main/headroom/shared_context.py).

## Retrieving Context Across Agents

### Accessing Compressed vs. Full Content

The `SharedContext.get(key, full=False)` method returns the **compressed** version by default, which downstream agents typically consume. This compressed payload is approximately 80% smaller than the original, making it ideal for efficient transmission between agents.

When an agent requires detailed inspection of specific data points, setting `full=True` returns the **original** uncompressed text. This "zoom in" capability allows agents to balance efficiency with precision when working with shared memory.

## Monitoring Compression Efficiency

The `SharedContext.stats()` method aggregates token savings across all active entries, exposing:

- Number of active entries
- Total original and compressed token counts  
- Total tokens saved
- Overall savings percentage

This introspection capability, implemented in [`headroom/shared_context.py`](https://github.com/chopratejas/headroom/blob/main/headroom/shared_context.py), makes it easy to monitor the memory-compression efficiency of cross-agent workflows in real-time.

## Practical Implementation Examples

### Basic Storage and Retrieval

```python
from headroom import SharedContext

# Create a shared context (usually a singleton per process)

ctx = SharedContext()

# Agent A stores a large research result

large_output = """... a very long markdown or JSON ..."""
entry = ctx.put("research", large_output, agent="agent_A")
print(f"Saved {entry.savings_percent}% tokens")

# Agent B later obtains the compressed version

compressed = ctx.get("research")
print("Compressed payload:", compressed[:200], "...")

# Agent B wants the full text for a deep dive

full_text = ctx.get("research", full=True)
print("Full text length:", len(full_text))

# Inspect statistics

stats = ctx.stats()
print(f"Overall savings: {stats.savings_percent}% across {stats.entries} entries")

```

### Integration with Proxy Handlers

```python
from headroom.shared_context import SharedContext
from headroom.proxy.handlers.openai import OpenAIHandler

shared_ctx = SharedContext()

class MyOpenAIHandler(OpenAIHandler):
    async def handle(self, request):
        # Before sending request, store the tool output

        tool_output = request["messages"][-1]["content"]
        shared_ctx.put("latest_tool_output", tool_output, agent="openai_handler")

        # Retrieve compressed context for the next turn

        compressed = shared_ctx.get("latest_tool_output")
        request["messages"].append({"role": "assistant", "content": compressed})

        return await super().handle(request)

```

## Summary

- `SharedContext` provides thread-safe, compressed storage for cross-agent memory in Headroom, located in [`headroom/shared_context.py`](https://github.com/chopratejas/headroom/blob/main/headroom/shared_context.py)
- The `put()` method automatically compresses content using the pipeline defined in [`headroom/compress.py`](https://github.com/chopratejas/headroom/blob/main/headroom/compress.py), reducing token counts by approximately 80%
- Entries default to a 1-hour TTL with automatic eviction when `max_entries` is exceeded, preventing memory exhaustion
- Agents retrieve compressed payloads by default via `get()`, with `full=True` providing access to the original uncompressed text
- The `stats()` method tracks aggregate token savings across all shared entries for monitoring compression efficiency

## Frequently Asked Questions

### What is SharedContext in Headroom?

SharedContext is a centralized memory store that enables cross-agent communication within the Headroom framework. It allows multiple agents running in the same process to share large data payloads through compressed representations, reducing token transmission costs while maintaining thread safety via `threading.Lock`.

### How does the compression pipeline in SharedContext work?

When storing data via `put()`, SharedContext invokes `headroom.compress.compress`, which executes a multi-stage pipeline: CacheAligner → ContentRouter → SmartCrusher / CodeCompressor / Kompress. This pipeline removes redundant tokens and optimizes the payload structure, typically achieving approximately 80% size reduction while preserving semantic content.

### How long does data persist in SharedContext?

By default, entries expire after 1 hour (configurable via the `ttl` parameter). Additionally, when the store reaches `max_entries`, the oldest entry is automatically evicted. The `get` and `keys` methods automatically filter out expired entries, ensuring agents only access valid context.

### Is SharedContext safe for concurrent multi-agent access?

Yes. According to the source code in [`headroom/shared_context.py`](https://github.com/chopratejas/headroom/blob/main/headroom/shared_context.py), the internal `_entries` dictionary is protected by `threading.Lock`, making all read and write operations thread-safe. This allows multiple agents to simultaneously access and update shared context without race conditions or data corruption.