# How Cross-Agent Memory Works with SharedContext Across Different LLMs

> Discover how headroom's SharedContext enables cross-agent memory, allowing LLMs like Claude and Gemini to share large contexts efficiently via compression. Learn more about this model-agnostic memory bus.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: deep-dive
- Published: 2026-06-07

---

**TLDR:** `SharedContext` in the headroom library acts as a model-agnostic memory bus, enabling agents built on different LLMs—such as Claude and Gemini—to share large contexts efficiently through compression without transmitting full payloads on every exchange.

The `chopratejas/headroom` repository solves the problem of passing large contexts between heterogeneous AI agents. Its `SharedContext` class provides **cross-agent memory** by compressing content once and making that compressed payload available to any downstream agent, regardless of whether it calls Anthropic, Google, or another provider.

## How Compression Powers the Shared Memory Bus

When an agent stores data, `SharedContext.put(key, content)` forwards the raw `content` to the generic `compress` API in [`headroom/compress.py`](https://github.com/chopratejas/headroom/blob/main/headroom/compress.py). This entry point builds a Headroom compression pipeline that runs model-agnostic transforms—such as **SmartCrusher**, **Kompress**, and **CodeCompressor**—and returns a `CompressResult` containing both the compressed text and token-level statistics.

The pipeline accepts an LLM identifier string (for example, `"claude-sonnet-4-5-20250929"` or `"gemini-1.5-pro"`) for token counting, but the compression logic itself remains model-agnostic. As implemented in `chopratejas/headroom`, this means the same compressed token budget applies whether the downstream consumer is Claude, Gemini, or a local open-source model.

## Storing and Retrieving Context Entries

Inside [`headroom/shared_context.py`](https://github.com/chopratejas/headroom/blob/main/headroom/shared_context.py), the `put` method instantiates a `ContextEntry` dataclass (defined around lines 38–48) that records:

- `original` — the full uncompressed text.
- `compressed` — the shrunk version produced by the pipeline.
- `original_tokens` and `compressed_tokens` — token counts for measurement.
- `agent` — the identifier of the agent that stored the entry.
- `timestamp` and `transforms` — audit metadata including which compressors ran.

### Retrieval Options for Consumer Agents

A consuming agent has three ways to access shared memory:

- `SharedContext.get(key)` returns the **compressed** text by default, minimizing token spend.
- `SharedContext.get(key, full=True)` returns the original uncompressed text on demand.
- `SharedContext.get_entry(key)` returns the full `ContextEntry` object, useful for debugging, audit trails, or analyzing per-entry savings.

## Cross-Agent Memory Between Claude and Gemini

Because `SharedContext` is a plain Python object, agents backed by different LLM families can hold a reference to the same instance and exchange context seamlessly. For example, Agent A (Claude) stores a research report with `ctx.put("research", report)`, and Agent B (Gemini) later calls `ctx.get("research")` to receive a compressed payload that is roughly 80% smaller yet semantically rich enough for its next step. If Agent B needs the full details for a specific sub-task, it simply invokes `ctx.get("research", full=True)`.

According to the headroom source code, no further model-specific handling is required because the compression is performed before any LLM call. The downstream agent consumes the compressed string directly.

## Managing Memory Lifecycle and Eviction

`SharedContext` automatically expires entries after a configurable `ttl` (defaulting to one hour) and evicts older entries when the `max_entries` limit is reached. The `_evict_if_needed` method in [`headroom/shared_context.py`](https://github.com/chopratejas/headroom/blob/main/headroom/shared_context.py) enforces these bounds, guaranteeing that stale data does not linger and that memory usage stays predictable across long-running multi-agent workflows.

## Practical Code Examples

### Basic Same-Process Usage

```python
from headroom import SharedContext, compress

# Initialise a shared context that will use Claude for token counting

ctx = SharedContext(model="claude-sonnet-4-5-20250929")

# Agent A stores a huge research output

research = "..."  # very long string

entry = ctx.put("research", research, agent="agent_a")
print(f"Saved {entry.savings_percent}% tokens")

# Agent B reads the compressed version (default)

compressed = ctx.get("research")
print("Compressed size:", len(compressed))

# Agent B needs the full text for a deep dive

full = ctx.get("research", full=True)
assert full == research

```

### Mixing Claude and Gemini Backends

```python

# Agent A (Claude) – compress with Claude's token rules

ctx = SharedContext(model="claude-sonnet-4-5-20250929")
ctx.put("analysis", "large JSON payload ...", agent="claude_agent")

# Agent B (Gemini) – retrieve compressed text and pass to Gemini

compressed = ctx.get("analysis")
gemini_prompt = [
    {"role": "user", "content": f"Please summarize this data: {compressed}"}
]

# The Gemini client sees only the compressed payload, saving cost

from google.generativeai import GenerativeModel
model = GenerativeModel("gemini-1.5-pro")
response = model.generate_content(gemini_prompt)
print(response.text)

```

### TTL and Eviction Behavior

```python

# Short TTL for demo purposes

ctx = SharedContext(ttl=5, max_entries=2)

ctx.put("first", "data 1")
ctx.put("second", "data 2")
ctx.put("third", "data 3")   # "first" will be evicted (max_entries=2)

assert ctx.get("first") is None
assert ctx.get("second") is not None
assert ctx.get("third") is not None

# Wait for expiration

import time
time.sleep(6)
assert ctx.get("second") is None   # expired after 5 s

```

### Inspecting Entry Metadata

```python
entry = ctx.get_entry("analysis")
print("Original tokens:", entry.original_tokens)
print("Compressed tokens:", entry.compressed_tokens)
print("Transforms applied:", entry.transforms)

```

## Summary

- `SharedContext` in [`headroom/shared_context.py`](https://github.com/chopratejas/headroom/blob/main/headroom/shared_context.py) provides a **model-agnostic memory bus** that lets diverse agents share large payloads without repeated full-text transmission.
- The `compress` function in [`headroom/compress.py`](https://github.com/chopratejas/headroom/blob/main/headroom/compress.py) powers the pipeline, running transforms like SmartCrusher and Kompress before any LLM-specific logic is needed.
- Consumer agents retrieve compressed text by default via `SharedContext.get(key)` and can request the original via `get(key, full=True)` or inspect metadata via `get_entry(key)`.
- Built-in **TTL** and **max_entries** eviction keep memory bounded in long-running agent systems.
- Because the interface is a plain Python object, agents using Claude, Gemini, or custom wrappers can all reference the same shared store, as exposed at the package level in [`headroom/__init__.py`](https://github.com/chopratejas/headroom/blob/main/headroom/__init__.py) (line 274).

## Frequently Asked Questions

### Can Claude and Gemini agents share the same SharedContext instance?

Yes. Because `SharedContext` is a plain Python object, agents backed by different LLM families can share the same instance or a process-wide singleton. Agent A (Claude) can call `put()` and Agent B (Gemini) can call `get()` on the same object without compatibility issues.

### Does SharedContext require both agents to use the same LLM tokenizer?

No. Compression is performed *before* any LLM call by the model-agnostic pipeline in [`headroom/compress.py`](https://github.com/chopratejas/headroom/blob/main/headroom/compress.py). The requesting agent supplies an LLM identifier for token counting, but the compressed string that is stored and retrieved can be consumed by any downstream agent regardless of its tokenizer.

### How does SharedContext prevent memory from growing forever?

Entries automatically expire after the `ttl` interval (default one hour) and are evicted when `max_entries` is reached via the `_evict_if_needed` method in [`headroom/shared_context.py`](https://github.com/chopratejas/headroom/blob/main/headroom/shared_context.py). This keeps the shared store bounded and prevents stale context from accumulating.

### Can I retrieve the original uncompressed text after it has been compressed?

Yes. While `SharedContext.get(key)` returns the compressed version by default to save tokens, passing `full=True` yields the exact original text. You can also call `SharedContext.get_entry(key)` to access the full `ContextEntry` dataclass containing both versions plus metadata.