# How Headroom's Cross-Agent Memory Works with SharedContext: A Technical Deep Dive

> Discover how Headroom's cross-agent memory utilizes SharedContext for efficient, thread-safe caching and automatic compression of agent outputs. Learn more about this technical deep dive.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: deep-dive
- Published: 2026-06-05

---

**Headroom's cross-agent memory uses the `SharedContext` class to provide a thread-safe, TTL-aware cache that automatically compresses agent outputs, allowing subsequent agents to retrieve compressed summaries while preserving the original data for on-demand access.**

The Headroom library (`chopratejas/headroom`) solves the problem of passing large payloads between different AI agents by providing a `SharedContext` abstraction that acts as a process-wide memory store. This cross-agent memory system enables CrewAI, LangGraph, and OpenAI Agents SDK workflows to share state efficiently without redundant network calls or token-wasting repetitions.

## Core Architecture of Headroom's Cross-Agent Memory

The cross-agent memory implementation centers on four key components that work together in [`headroom/shared_context.py`](https://github.com/chopratejas/headroom/blob/main/headroom/shared_context.py) to provide safe, compressed data sharing.

### The SharedContext Class

The **SharedContext** class serves as the public API that agents import via `from headroom import SharedContext`. As implemented in lines 67-89 of [`headroom/shared_context.py`](https://github.com/chopratejas/headroom/blob/main/headroom/shared_context.py), it holds an in-memory dictionary of `ContextEntry` objects and protects concurrent access using a `threading.Lock`. The constructor accepts `ttl` (default 3600 seconds) and `max_entries` parameters to enforce memory limits:

```python
from headroom import SharedContext

# Initialize with 30-minute TTL and 50-entry limit

ctx = SharedContext(ttl=1800, max_entries=50)

```

### ContextEntry Dataclass

Each stored item becomes a **ContextEntry** (lines 36-55), a dataclass that tracks both the original and compressed text. It stores token counts, timestamps, the agent identifier, and a list of transforms applied during compression. This enables per-entry statistics like `savings_percent` and provides full traceability for debugging multi-agent workflows.

### Compression Pipeline Integration

When `SharedContext.put()` is called, it invokes `headroom.compress.compress()` (lines 98-112) to reduce payload size before storage. This uses the same CCR (Compress-Cache-Retrieve) stack as the Headroom proxy, applying **SmartCrusher** for JSON, **CodeCompressor** for code blocks, and **Kompress** for plain text. Every stored item undergoes consistent compression heuristics regardless of which agent originates the data.

### TTL and Eviction Strategy

To prevent unbounded memory growth, entries expire after the configured `ttl` seconds. When `max_entries` is reached, the system evicts the oldest entry via `_evict_if_needed()` (lines 120-129 and 130-138). Stale context automatically disappears on the next `get` or `get_entry` call, ensuring workflow isolation.

### Thread-Safety Mechanisms

All mutating operations—`put`, `clear`, and eviction—acquire the same `Lock`. Read operations like `get`, `keys`, and `stats` also lock long enough to snapshot the entry, making the object safe to share between concurrent agents in the same Python process (lines 90-98).

## Cross-Agent Memory Workflow

The typical lifecycle of data passing through Headroom's shared memory follows four distinct steps:

1. **Agent A stores output** using `ctx.put(key, data, agent="name")`, which compresses the payload and creates a `ContextEntry` in the internal `_entries` map.

2. **Agent B retrieves the compressed version** via `ctx.get(key)`, receiving the compressed text by default to save context window space.

3. **Agent B requests original data** when needed by passing `full=True` to `ctx.get(key, full=True)`, fetching the uncompressed original without recomputation.

4. **Monitoring and cleanup** occur through `ctx.stats()`, which aggregates token counts across entries, while automatic expiration and eviction handle housekeeping.

## Practical Implementation Examples

### Basic Usage Pattern

```python
from headroom import SharedContext

# 1️⃣ Initialize (one per process)

ctx = SharedContext(ttl=1800, max_entries=50)

# 2️⃣ Store large JSON output

large_json = '{"items": [...]}'  # Imagine 10,000 tokens

entry = ctx.put("search_results", large_json, agent="searcher")
print(f"Compression saved {entry.savings_percent}%")  # → e.g., 85.0

# 3️⃣ Retrieve compressed summary for next agent

summary = ctx.get("search_results")  # ~1,500 tokens

# 4️⃣ Access full original when detailed analysis needed

full_data = ctx.get("search_results", full=True)

# 5️⃣ Inspect metadata

meta = ctx.get_entry("search_results")
print(meta.transforms)  # ['smart_crusher', 'kompress']

# 6️⃣ Workflow cleanup

print(ctx.stats())  # Aggregated savings across all entries

ctx.clear()

```

### Integration with CrewAI

```python

# After research task completes

ctx.put("findings", researcher_task.output.raw, agent="researcher")

# Coding agent receives compressed context

coder_context = ctx.get("findings")

```

### Integration with LangGraph

```python
def researcher_node(state):
    result = do_research()
    ctx.put("research", result)
    # Pass compressed version to next node

    return {"research_summary": ctx.get("research")}

```

### Integration with OpenAI Agents SDK

```python
def compress_handoff(messages):
    for msg in messages:
        if len(msg.content) > 1000:
            ctx.put(msg.id, msg.content)
            msg.content = ctx.get(msg.id)  # Replace with compressed version

    return messages

```

## Summary

- **SharedContext** provides a process-wide memory store that any agent can access via `put()` and `get()` methods without network overhead.
- Automatic compression via the CCR stack (SmartCrusher, CodeCompressor, Kompress) reduces token counts by 60-90% while preserving originals for `full=True` retrieval.
- Thread-safe implementation using `threading.Lock` allows concurrent access from multiple agents in the same Python process.
- TTL and eviction policies prevent memory leaks by removing stale entries after the configured timeout or when capacity limits are reached.
- Per-entry metadata through **ContextEntry** enables detailed tracing of compression transforms and token savings statistics.

## Frequently Asked Questions

### What is SharedContext in Headroom and why does it matter for multi-agent systems?

**SharedContext** is a pure-Python utility class in [`headroom/shared_context.py`](https://github.com/chopratejas/headroom/blob/main/headroom/shared_context.py) that acts as a cross-agent memory layer. It matters because it eliminates the need for agents to repeatedly pass large payloads through message queues or LLM context windows, instead storing compressed versions that subsequent agents can access instantly via shared memory.

### How does the compression work when storing data in SharedContext?

When you call `SharedContext.put()`, the method immediately invokes `headroom.compress.compress()` using the same pipeline as the Headroom proxy. This applies heuristics like SmartCrusher for JSON structures, CodeCompressor for programming syntax, and Kompress for natural text, storing both the original and compressed variants in a `ContextEntry` object.

### Is SharedContext safe to use with concurrent agents?

Yes. The implementation uses a `threading.Lock` to protect all write operations and snapshot reads, making it safe to share a single `SharedContext` instance across threads. According to the source code in lines 90-98 of [`headroom/shared_context.py`](https://github.com/chopratejas/headroom/blob/main/headroom/shared_context.py), both mutating operations (`put`, `clear`) and reads (`get`, `stats`) acquire the lock to prevent race conditions.

### Can I retrieve the original uncompressed data after storage?

Absolutely. While `ctx.get(key)` returns the compressed version by default to save context space, passing `full=True` as in `ctx.get(key, full=True)` returns the exact original text stored in the `ContextEntry.original` field without any recompression or network calls.