How Cross-Agent Memory Works with SharedContext Across Different LLMs

TLDR: SharedContext in the headroom library acts as a model-agnostic memory bus, enabling agents built on different LLMs—such as Claude and Gemini—to share large contexts efficiently through compression without transmitting full payloads on every exchange.

The chopratejas/headroom repository solves the problem of passing large contexts between heterogeneous AI agents. Its SharedContext class provides cross-agent memory by compressing content once and making that compressed payload available to any downstream agent, regardless of whether it calls Anthropic, Google, or another provider.

How Compression Powers the Shared Memory Bus

When an agent stores data, SharedContext.put(key, content) forwards the raw content to the generic compress API in headroom/compress.py. This entry point builds a Headroom compression pipeline that runs model-agnostic transforms—such as SmartCrusher, Kompress, and CodeCompressor—and returns a CompressResult containing both the compressed text and token-level statistics.

The pipeline accepts an LLM identifier string (for example, "claude-sonnet-4-5-20250929" or "gemini-1.5-pro") for token counting, but the compression logic itself remains model-agnostic. As implemented in chopratejas/headroom, this means the same compressed token budget applies whether the downstream consumer is Claude, Gemini, or a local open-source model.

Storing and Retrieving Context Entries

Inside headroom/shared_context.py, the put method instantiates a ContextEntry dataclass (defined around lines 38–48) that records:

  • original — the full uncompressed text.
  • compressed — the shrunk version produced by the pipeline.
  • original_tokens and compressed_tokens — token counts for measurement.
  • agent — the identifier of the agent that stored the entry.
  • timestamp and transforms — audit metadata including which compressors ran.

Retrieval Options for Consumer Agents

A consuming agent has three ways to access shared memory:

  • SharedContext.get(key) returns the compressed text by default, minimizing token spend.
  • SharedContext.get(key, full=True) returns the original uncompressed text on demand.
  • SharedContext.get_entry(key) returns the full ContextEntry object, useful for debugging, audit trails, or analyzing per-entry savings.

Cross-Agent Memory Between Claude and Gemini

Because SharedContext is a plain Python object, agents backed by different LLM families can hold a reference to the same instance and exchange context seamlessly. For example, Agent A (Claude) stores a research report with ctx.put("research", report), and Agent B (Gemini) later calls ctx.get("research") to receive a compressed payload that is roughly 80% smaller yet semantically rich enough for its next step. If Agent B needs the full details for a specific sub-task, it simply invokes ctx.get("research", full=True).

According to the headroom source code, no further model-specific handling is required because the compression is performed before any LLM call. The downstream agent consumes the compressed string directly.

Managing Memory Lifecycle and Eviction

SharedContext automatically expires entries after a configurable ttl (defaulting to one hour) and evicts older entries when the max_entries limit is reached. The _evict_if_needed method in headroom/shared_context.py enforces these bounds, guaranteeing that stale data does not linger and that memory usage stays predictable across long-running multi-agent workflows.

Practical Code Examples

Basic Same-Process Usage

from headroom import SharedContext, compress

# Initialise a shared context that will use Claude for token counting

ctx = SharedContext(model="claude-sonnet-4-5-20250929")

# Agent A stores a huge research output

research = "..."  # very long string

entry = ctx.put("research", research, agent="agent_a")
print(f"Saved {entry.savings_percent}% tokens")

# Agent B reads the compressed version (default)

compressed = ctx.get("research")
print("Compressed size:", len(compressed))

# Agent B needs the full text for a deep dive

full = ctx.get("research", full=True)
assert full == research

Mixing Claude and Gemini Backends


# Agent A (Claude) – compress with Claude's token rules

ctx = SharedContext(model="claude-sonnet-4-5-20250929")
ctx.put("analysis", "large JSON payload ...", agent="claude_agent")

# Agent B (Gemini) – retrieve compressed text and pass to Gemini

compressed = ctx.get("analysis")
gemini_prompt = [
    {"role": "user", "content": f"Please summarize this data: {compressed}"}
]

# The Gemini client sees only the compressed payload, saving cost

from google.generativeai import GenerativeModel
model = GenerativeModel("gemini-1.5-pro")
response = model.generate_content(gemini_prompt)
print(response.text)

TTL and Eviction Behavior


# Short TTL for demo purposes

ctx = SharedContext(ttl=5, max_entries=2)

ctx.put("first", "data 1")
ctx.put("second", "data 2")
ctx.put("third", "data 3")   # "first" will be evicted (max_entries=2)

assert ctx.get("first") is None
assert ctx.get("second") is not None
assert ctx.get("third") is not None

# Wait for expiration

import time
time.sleep(6)
assert ctx.get("second") is None   # expired after 5 s

Inspecting Entry Metadata

entry = ctx.get_entry("analysis")
print("Original tokens:", entry.original_tokens)
print("Compressed tokens:", entry.compressed_tokens)
print("Transforms applied:", entry.transforms)

Summary

  • SharedContext in headroom/shared_context.py provides a model-agnostic memory bus that lets diverse agents share large payloads without repeated full-text transmission.
  • The compress function in headroom/compress.py powers the pipeline, running transforms like SmartCrusher and Kompress before any LLM-specific logic is needed.
  • Consumer agents retrieve compressed text by default via SharedContext.get(key) and can request the original via get(key, full=True) or inspect metadata via get_entry(key).
  • Built-in TTL and max_entries eviction keep memory bounded in long-running agent systems.
  • Because the interface is a plain Python object, agents using Claude, Gemini, or custom wrappers can all reference the same shared store, as exposed at the package level in headroom/__init__.py (line 274).

Frequently Asked Questions

Can Claude and Gemini agents share the same SharedContext instance?

Yes. Because SharedContext is a plain Python object, agents backed by different LLM families can share the same instance or a process-wide singleton. Agent A (Claude) can call put() and Agent B (Gemini) can call get() on the same object without compatibility issues.

Does SharedContext require both agents to use the same LLM tokenizer?

No. Compression is performed before any LLM call by the model-agnostic pipeline in headroom/compress.py. The requesting agent supplies an LLM identifier for token counting, but the compressed string that is stored and retrieved can be consumed by any downstream agent regardless of its tokenizer.

How does SharedContext prevent memory from growing forever?

Entries automatically expire after the ttl interval (default one hour) and are evicted when max_entries is reached via the _evict_if_needed method in headroom/shared_context.py. This keeps the shared store bounded and prevents stale context from accumulating.

Can I retrieve the original uncompressed text after it has been compressed?

Yes. While SharedContext.get(key) returns the compressed version by default to save tokens, passing full=True yields the exact original text. You can also call SharedContext.get_entry(key) to access the full ContextEntry dataclass containing both versions plus metadata.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →