How Cross-Agent Memory Works with SharedContext Across Different LLMs
TLDR: SharedContext in the headroom library acts as a model-agnostic memory bus, enabling agents built on different LLMs—such as Claude and Gemini—to share large contexts efficiently through compression without transmitting full payloads on every exchange.
The chopratejas/headroom repository solves the problem of passing large contexts between heterogeneous AI agents. Its SharedContext class provides cross-agent memory by compressing content once and making that compressed payload available to any downstream agent, regardless of whether it calls Anthropic, Google, or another provider.
How Compression Powers the Shared Memory Bus
When an agent stores data, SharedContext.put(key, content) forwards the raw content to the generic compress API in headroom/compress.py. This entry point builds a Headroom compression pipeline that runs model-agnostic transforms—such as SmartCrusher, Kompress, and CodeCompressor—and returns a CompressResult containing both the compressed text and token-level statistics.
The pipeline accepts an LLM identifier string (for example, "claude-sonnet-4-5-20250929" or "gemini-1.5-pro") for token counting, but the compression logic itself remains model-agnostic. As implemented in chopratejas/headroom, this means the same compressed token budget applies whether the downstream consumer is Claude, Gemini, or a local open-source model.
Storing and Retrieving Context Entries
Inside headroom/shared_context.py, the put method instantiates a ContextEntry dataclass (defined around lines 38–48) that records:
original— the full uncompressed text.compressed— the shrunk version produced by the pipeline.original_tokensandcompressed_tokens— token counts for measurement.agent— the identifier of the agent that stored the entry.timestampandtransforms— audit metadata including which compressors ran.
Retrieval Options for Consumer Agents
A consuming agent has three ways to access shared memory:
SharedContext.get(key)returns the compressed text by default, minimizing token spend.SharedContext.get(key, full=True)returns the original uncompressed text on demand.SharedContext.get_entry(key)returns the fullContextEntryobject, useful for debugging, audit trails, or analyzing per-entry savings.
Cross-Agent Memory Between Claude and Gemini
Because SharedContext is a plain Python object, agents backed by different LLM families can hold a reference to the same instance and exchange context seamlessly. For example, Agent A (Claude) stores a research report with ctx.put("research", report), and Agent B (Gemini) later calls ctx.get("research") to receive a compressed payload that is roughly 80% smaller yet semantically rich enough for its next step. If Agent B needs the full details for a specific sub-task, it simply invokes ctx.get("research", full=True).
According to the headroom source code, no further model-specific handling is required because the compression is performed before any LLM call. The downstream agent consumes the compressed string directly.
Managing Memory Lifecycle and Eviction
SharedContext automatically expires entries after a configurable ttl (defaulting to one hour) and evicts older entries when the max_entries limit is reached. The _evict_if_needed method in headroom/shared_context.py enforces these bounds, guaranteeing that stale data does not linger and that memory usage stays predictable across long-running multi-agent workflows.
Practical Code Examples
Basic Same-Process Usage
from headroom import SharedContext, compress
# Initialise a shared context that will use Claude for token counting
ctx = SharedContext(model="claude-sonnet-4-5-20250929")
# Agent A stores a huge research output
research = "..." # very long string
entry = ctx.put("research", research, agent="agent_a")
print(f"Saved {entry.savings_percent}% tokens")
# Agent B reads the compressed version (default)
compressed = ctx.get("research")
print("Compressed size:", len(compressed))
# Agent B needs the full text for a deep dive
full = ctx.get("research", full=True)
assert full == research
Mixing Claude and Gemini Backends
# Agent A (Claude) – compress with Claude's token rules
ctx = SharedContext(model="claude-sonnet-4-5-20250929")
ctx.put("analysis", "large JSON payload ...", agent="claude_agent")
# Agent B (Gemini) – retrieve compressed text and pass to Gemini
compressed = ctx.get("analysis")
gemini_prompt = [
{"role": "user", "content": f"Please summarize this data: {compressed}"}
]
# The Gemini client sees only the compressed payload, saving cost
from google.generativeai import GenerativeModel
model = GenerativeModel("gemini-1.5-pro")
response = model.generate_content(gemini_prompt)
print(response.text)
TTL and Eviction Behavior
# Short TTL for demo purposes
ctx = SharedContext(ttl=5, max_entries=2)
ctx.put("first", "data 1")
ctx.put("second", "data 2")
ctx.put("third", "data 3") # "first" will be evicted (max_entries=2)
assert ctx.get("first") is None
assert ctx.get("second") is not None
assert ctx.get("third") is not None
# Wait for expiration
import time
time.sleep(6)
assert ctx.get("second") is None # expired after 5 s
Inspecting Entry Metadata
entry = ctx.get_entry("analysis")
print("Original tokens:", entry.original_tokens)
print("Compressed tokens:", entry.compressed_tokens)
print("Transforms applied:", entry.transforms)
Summary
SharedContextinheadroom/shared_context.pyprovides a model-agnostic memory bus that lets diverse agents share large payloads without repeated full-text transmission.- The
compressfunction inheadroom/compress.pypowers the pipeline, running transforms like SmartCrusher and Kompress before any LLM-specific logic is needed. - Consumer agents retrieve compressed text by default via
SharedContext.get(key)and can request the original viaget(key, full=True)or inspect metadata viaget_entry(key). - Built-in TTL and max_entries eviction keep memory bounded in long-running agent systems.
- Because the interface is a plain Python object, agents using Claude, Gemini, or custom wrappers can all reference the same shared store, as exposed at the package level in
headroom/__init__.py(line 274).
Frequently Asked Questions
Can Claude and Gemini agents share the same SharedContext instance?
Yes. Because SharedContext is a plain Python object, agents backed by different LLM families can share the same instance or a process-wide singleton. Agent A (Claude) can call put() and Agent B (Gemini) can call get() on the same object without compatibility issues.
Does SharedContext require both agents to use the same LLM tokenizer?
No. Compression is performed before any LLM call by the model-agnostic pipeline in headroom/compress.py. The requesting agent supplies an LLM identifier for token counting, but the compressed string that is stored and retrieved can be consumed by any downstream agent regardless of its tokenizer.
How does SharedContext prevent memory from growing forever?
Entries automatically expire after the ttl interval (default one hour) and are evicted when max_entries is reached via the _evict_if_needed method in headroom/shared_context.py. This keeps the shared store bounded and prevents stale context from accumulating.
Can I retrieve the original uncompressed text after it has been compressed?
Yes. While SharedContext.get(key) returns the compressed version by default to save tokens, passing full=True yields the exact original text. You can also call SharedContext.get_entry(key) to access the full ContextEntry dataclass containing both versions plus metadata.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →