What Agent Memory Architectures Are Taught in AI: 4 Production Patterns Explained
The AI Engineering from Scratch curriculum teaches four concrete agent memory architectures—Mem0-style hybrid stores, Letta-style tiered blocks, MemGPT-style virtual contexts, and audited shared-memory blackboards—each implemented as reusable Python skills with production-grade features like scope-aware retrieval and sleep-time consolidation.
The repository rohitg00/ai-engineering-from-scratch provides hands-on implementations of these agent memory architectures in pure Python, packaged as markdown skill files that developers can import into any agent project. These designs span from single-agent context management to federated multi-agent systems, addressing critical concerns such as temporal invalidation, citation contracts, and provenance tracking.
Mem0-Style Hybrid Memory Architecture
This architecture implements a three-store system that unifies vector embeddings, key-value triples, and graph edges under a single fusion scorer.
Core Components and Fusion Scoring
The system combines three specialized backends:
- Vector store (e.g., Qdrant, pgvector) for similarity search
- KV store (e.g., Redis) for exact lookups
- Graph store (e.g., Neo4j) for relational reasoning
Retrieval uses a configurable fusion formula: score = w_rel * relevance + w_imp * importance + w_rec * recency. According to the source code in phases/14-agent-engineering/09-hybrid-memory-mem0/outputs/skill-hybrid-memory.md, this allows fine-grained trade-offs between semantic similarity and recency.
Scope-Aware Retrieval and Temporal Invalidation
The architecture enforces scope taxonomy (user, session, agent) to prevent data leakage between users. Rather than deleting records, it implements temporal invalidation—contradictory updates receive timestamps and obsolete markers while preserving the original record for auditability.
class HybridMemory:
def __init__(self, vec_cfg, kv_cfg, graph_cfg, weights):
self.vec = VectorStore(**vec_cfg)
self.kv = KVStore(**kv_cfg)
self.graph = GraphStore(**graph_cfg)
self.w_rel, self.w_imp, self.w_rec = weights
def add(self, text, user_id, session_id, scope, importance, tags):
vec, kv, graph = self._extract(text)
self.vec.add(vec, metadata={'user': user_id, 'scope': scope, 'tags': tags})
self.kv.add(**kv, metadata={'user': user_id, 'scope': scope})
self.graph.add(**graph, metadata={'session': session_id, 'scope': scope})
def search(self, query, scope):
vec_res = self.vec.search(query)
kv_res = self.kv.search(query)
graph_res = self.graph.search(query)
fused = []
for rec in set(vec_res + kv_res + graph_res):
relevance = rec['score']
importance = rec.get('importance', 0)
recency = rec.get('recency', 0)
fused.append((rec, self.w_rel*relevance + self.w_imp*importance + self.w_rec*recency))
return sorted(fused, key=lambda x: -x[1])
Letta-Style Memory Blocks with Sleep-Time Compute
This three-tier architecture separates memory into core blocks (facts, persona, task), a recall store for recent turns, and an archival store for long-term data.
Three-Tier Layout and Block Versioning
As implemented in phases/14-agent-engineering/08-memory-blocks-sleep-time-compute/outputs/skill-memory-blocks.md, the system uses:
- Block objects: Mutable state containers with versioning and near-limit alerts
- Recall store: Paginated log of recent turns with capacity-based eviction
- Archival store: Long-term persistence using invalidation markers instead of deletions
Sleep-Time Consolidation
A background consolidation agent runs after each user turn, summarizing over-limit blocks and cleaning contradictory entries. This keeps the critical path lean while maintaining long-term knowledge consistency.
class SleepTimeAgent:
def __init__(self, block_store, archival):
self.blocks = block_store
self.archival = archival
def run(self):
for block in self.blocks.all():
if block.near_limit():
summary = summarize(block.history)
self.archival.insert(summary, tags=['summary'])
block.clear_history()
for rec in self.archival.recent_conflicts():
self.archival.invalidate(rec.id)
MemGPT-Style Virtual Context Management
This design uses a two-tier system with strict boundaries between active context and archival storage.
Bounded MainContext and Citation Contracts
The MainContext maintains a FIFO message buffer with auto-eviction when token budgets are exceeded. According to phases/14-agent-engineering/07-memory-virtual-context-memgpt/outputs/skill-virtual-memory.md, evicted turns remain searchable in the ArchivalStore.
The system enforces a strict citation contract: every archival hit must include its source ID, ensuring agents reference specific memories in their responses.
Memory Tools for LLM Access
The architecture exposes functions like core_memory_append and archival_memory_search as tools the LLM can invoke, giving the model explicit control over read/write operations.
class MainContext:
def __init__(self, max_tokens):
self.max_tokens = max_tokens
self.messages = []
self.core = {"facts": {}, "persona": {}, "task": {}}
def add_message(self, role, content):
self.messages.append({"role": role, "content": content})
if token_len(self.messages) > self.max_tokens:
self.evict()
def evict(self):
evicted = self.messages.pop(0)
archival.insert(evicted, turn_id=evicted['id'])
class ArchivalStore:
def __init__(self, backend):
self.backend = backend
def insert(self, record, **meta):
return self.backend.insert(record, **meta)
def search(self, query, top_k=5):
hits = self.backend.search(query, top_k=top_k)
return [(hit.id, hit.text) for hit in hits]
Shared-Memory Blackboard for Multi-Agent Systems
Designed for multi-agent swarms, this pattern uses a common blackboard with scoped projections and comprehensive audit trails.
Provenance Tracking and Safety Verification
The implementation in phases/16-multi-agent-and-swarms/13-shared-memory-blackboard/outputs/skill-memory-auditor.md requires:
- Provenance fields: writer identity, timestamp, and prompt hash for every entry
- Append-only logs: Versioned updates prevent silent mutations
- Verifier separation: A read-only safety agent audits the pool without write access
from memory_auditor import MemoryAuditor
auditor = MemoryAuditor(codebase_path='phases/16-multi-agent-and-swarms/')
report = auditor.run()
print(report.summary())
print(report.provenance())
Source Files and Skill Locations
Each architecture is packaged as a reusable skill markdown file:
| Architecture | Skill File Path | Key Features |
|---|---|---|
| Hybrid Memory | phases/14-agent-engineering/09-hybrid-memory-mem0/outputs/skill-hybrid-memory.md |
Fusion scoring, scope-aware retrieval, temporal invalidation |
| Memory Blocks | phases/14-agent-engineering/08-memory-blocks-sleep-time-compute/outputs/skill-memory-blocks.md |
Sleep-time consolidation, block versioning |
| Virtual Memory | phases/14-agent-engineering/07-memory-virtual-context-memgpt/outputs/skill-virtual-memory.md |
Citation contracts, memory tools |
| Memory Auditor | phases/16-multi-agent-and-swarms/13-shared-memory-blackboard/outputs/skill-memory-auditor.md |
Provenance tracking, poisoning detection |
Summary
- Mem0-style Hybrid Memory combines vector, KV, and graph stores with weighted fusion scoring and scope-aware retrieval to handle heterogeneous data safely.
- Letta-style Memory Blocks use three-tier storage with background sleep-time consolidation to maintain low latency while preserving long-term context.
- MemGPT-style Virtual Context enforces strict boundaries between active FIFO buffers and searchable archives, requiring citation of all retrieved memories.
- Shared-Memory Blackboard provides audit trails and provenance tracking for multi-agent systems, using read-only verifiers to detect poisoning attacks.
Frequently Asked Questions
What is the difference between Mem0 hybrid memory and MemGPT virtual context?
Mem0 hybrid memory uses three simultaneous stores (vector, KV, graph) blended via a fusion scorer that weights relevance, importance, and recency, making it ideal for heterogeneous data relationships. MemGPT virtual context strictly separates active conversation history (FIFO) from archival storage and requires explicit citations for any retrieved memory, optimizing for traceability in long-running conversations.
How does Letta-style sleep-time consolidation improve agent performance?
Sleep-time consolidation moves memory maintenance—such as summarizing full blocks and invalidating contradictions—off the critical request path, reducing latency for user-facing turns. According to the ai-engineering-from-scratch implementation, this background process runs automatically after each turn while the user waits, rather than blocking the main response generation.
Why do multi-agent systems need a shared-memory blackboard auditor?
The shared-memory blackboard pattern prevents data poisoning and silent mutations by enforcing append-only writes, provenance metadata (writer, timestamp, prompt hash), and strict separation between writer agents and read-only verifier agents. The auditor skill scans codebases to verify these safety properties before production deployment.
Can these agent memory architectures be combined in a single project?
Yes. The curriculum presents these architectures as modular skills that can be imported individually or composed—many production agents use Mem0-style retrieval for facts, Letta-style blocks for persona management, and MemGPT-style citations for conversation history, all within the same codebase.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →