deep-dive

What Agent Memory Architectures Are Taught in AI: 4 Production Patterns Explained

June 10, 2026 rohitg00/ai-engineering-from-scratch ↗

The AI Engineering from Scratch curriculum teaches four concrete agent memory architectures—Mem0-style hybrid stores, Letta-style tiered blocks, MemGPT-style virtual contexts, and audited shared-memory blackboards—each implemented as reusable Python skills with production-grade features like scope-aware retrieval and sleep-time consolidation.

The repository rohitg00/ai-engineering-from-scratch provides hands-on implementations of these agent memory architectures in pure Python, packaged as markdown skill files that developers can import into any agent project. These designs span from single-agent context management to federated multi-agent systems, addressing critical concerns such as temporal invalidation, citation contracts, and provenance tracking.

Mem0-Style Hybrid Memory Architecture

This architecture implements a three-store system that unifies vector embeddings, key-value triples, and graph edges under a single fusion scorer.

Core Components and Fusion Scoring

The system combines three specialized backends:

Vector store (e.g., Qdrant, pgvector) for similarity search
KV store (e.g., Redis) for exact lookups
Graph store (e.g., Neo4j) for relational reasoning

Retrieval uses a configurable fusion formula: score = w_rel * relevance + w_imp * importance + w_rec * recency. According to the source code in phases/14-agent-engineering/09-hybrid-memory-mem0/outputs/skill-hybrid-memory.md, this allows fine-grained trade-offs between semantic similarity and recency.

Scope-Aware Retrieval and Temporal Invalidation

The architecture enforces scope taxonomy (user, session, agent) to prevent data leakage between users. Rather than deleting records, it implements temporal invalidation—contradictory updates receive timestamps and obsolete markers while preserving the original record for auditability.

class HybridMemory:
    def __init__(self, vec_cfg, kv_cfg, graph_cfg, weights):
        self.vec = VectorStore(**vec_cfg)
        self.kv = KVStore(**kv_cfg)
        self.graph = GraphStore(**graph_cfg)
        self.w_rel, self.w_imp, self.w_rec = weights

    def add(self, text, user_id, session_id, scope, importance, tags):
        vec, kv, graph = self._extract(text)
        self.vec.add(vec, metadata={'user': user_id, 'scope': scope, 'tags': tags})
        self.kv.add(**kv, metadata={'user': user_id, 'scope': scope})
        self.graph.add(**graph, metadata={'session': session_id, 'scope': scope})

    def search(self, query, scope):
        vec_res = self.vec.search(query)
        kv_res = self.kv.search(query)
        graph_res = self.graph.search(query)
        
        fused = []
        for rec in set(vec_res + kv_res + graph_res):
            relevance = rec['score']
            importance = rec.get('importance', 0)
            recency = rec.get('recency', 0)
            fused.append((rec, self.w_rel*relevance + self.w_imp*importance + self.w_rec*recency))
        return sorted(fused, key=lambda x: -x[1])

Letta-Style Memory Blocks with Sleep-Time Compute

This three-tier architecture separates memory into core blocks (facts, persona, task), a recall store for recent turns, and an archival store for long-term data.

Three-Tier Layout and Block Versioning

As implemented in phases/14-agent-engineering/08-memory-blocks-sleep-time-compute/outputs/skill-memory-blocks.md, the system uses:

Block objects: Mutable state containers with versioning and near-limit alerts
Recall store: Paginated log of recent turns with capacity-based eviction
Archival store: Long-term persistence using invalidation markers instead of deletions

Sleep-Time Consolidation

A background consolidation agent runs after each user turn, summarizing over-limit blocks and cleaning contradictory entries. This keeps the critical path lean while maintaining long-term knowledge consistency.

class SleepTimeAgent:
    def __init__(self, block_store, archival):
        self.blocks = block_store
        self.archival = archival

    def run(self):
        for block in self.blocks.all():
            if block.near_limit():
                summary = summarize(block.history)
                self.archival.insert(summary, tags=['summary'])
                block.clear_history()
        
        for rec in self.archival.recent_conflicts():
            self.archival.invalidate(rec.id)

MemGPT-Style Virtual Context Management

This design uses a two-tier system with strict boundaries between active context and archival storage.

Bounded MainContext and Citation Contracts

The MainContext maintains a FIFO message buffer with auto-eviction when token budgets are exceeded. According to phases/14-agent-engineering/07-memory-virtual-context-memgpt/outputs/skill-virtual-memory.md, evicted turns remain searchable in the ArchivalStore.

The system enforces a strict citation contract: every archival hit must include its source ID, ensuring agents reference specific memories in their responses.

Memory Tools for LLM Access

The architecture exposes functions like core_memory_append and archival_memory_search as tools the LLM can invoke, giving the model explicit control over read/write operations.

class MainContext:
    def __init__(self, max_tokens):
        self.max_tokens = max_tokens
        self.messages = []
        self.core = {"facts": {}, "persona": {}, "task": {}}
    
    def add_message(self, role, content):
        self.messages.append({"role": role, "content": content})
        if token_len(self.messages) > self.max_tokens:
            self.evict()

    def evict(self):
        evicted = self.messages.pop(0)
        archival.insert(evicted, turn_id=evicted['id'])

class ArchivalStore:
    def __init__(self, backend):
        self.backend = backend
    
    def insert(self, record, **meta):
        return self.backend.insert(record, **meta)
    
    def search(self, query, top_k=5):
        hits = self.backend.search(query, top_k=top_k)
        return [(hit.id, hit.text) for hit in hits]

Shared-Memory Blackboard for Multi-Agent Systems

Designed for multi-agent swarms, this pattern uses a common blackboard with scoped projections and comprehensive audit trails.

Provenance Tracking and Safety Verification

The implementation in phases/16-multi-agent-and-swarms/13-shared-memory-blackboard/outputs/skill-memory-auditor.md requires:

Provenance fields: writer identity, timestamp, and prompt hash for every entry
Append-only logs: Versioned updates prevent silent mutations
Verifier separation: A read-only safety agent audits the pool without write access

from memory_auditor import MemoryAuditor

auditor = MemoryAuditor(codebase_path='phases/16-multi-agent-and-swarms/')
report = auditor.run()
print(report.summary())
print(report.provenance())

Source Files and Skill Locations

Each architecture is packaged as a reusable skill markdown file:

Architecture	Skill File Path	Key Features
Hybrid Memory	`phases/14-agent-engineering/09-hybrid-memory-mem0/outputs/skill-hybrid-memory.md`	Fusion scoring, scope-aware retrieval, temporal invalidation
Memory Blocks	`phases/14-agent-engineering/08-memory-blocks-sleep-time-compute/outputs/skill-memory-blocks.md`	Sleep-time consolidation, block versioning
Virtual Memory	`phases/14-agent-engineering/07-memory-virtual-context-memgpt/outputs/skill-virtual-memory.md`	Citation contracts, memory tools
Memory Auditor	`phases/16-multi-agent-and-swarms/13-shared-memory-blackboard/outputs/skill-memory-auditor.md`	Provenance tracking, poisoning detection

Summary

Mem0-style Hybrid Memory combines vector, KV, and graph stores with weighted fusion scoring and scope-aware retrieval to handle heterogeneous data safely.
Letta-style Memory Blocks use three-tier storage with background sleep-time consolidation to maintain low latency while preserving long-term context.
MemGPT-style Virtual Context enforces strict boundaries between active FIFO buffers and searchable archives, requiring citation of all retrieved memories.
Shared-Memory Blackboard provides audit trails and provenance tracking for multi-agent systems, using read-only verifiers to detect poisoning attacks.

Frequently Asked Questions

What is the difference between Mem0 hybrid memory and MemGPT virtual context?

Mem0 hybrid memory uses three simultaneous stores (vector, KV, graph) blended via a fusion scorer that weights relevance, importance, and recency, making it ideal for heterogeneous data relationships. MemGPT virtual context strictly separates active conversation history (FIFO) from archival storage and requires explicit citations for any retrieved memory, optimizing for traceability in long-running conversations.

How does Letta-style sleep-time consolidation improve agent performance?

Sleep-time consolidation moves memory maintenance—such as summarizing full blocks and invalidating contradictions—off the critical request path, reducing latency for user-facing turns. According to the ai-engineering-from-scratch implementation, this background process runs automatically after each turn while the user waits, rather than blocking the main response generation.

Why do multi-agent systems need a shared-memory blackboard auditor?

The shared-memory blackboard pattern prevents data poisoning and silent mutations by enforcing append-only writes, provenance metadata (writer, timestamp, prompt hash), and strict separation between writer agents and read-only verifier agents. The auditor skill scans codebases to verify these safety properties before production deployment.

Can these agent memory architectures be combined in a single project?

Yes. The curriculum presents these architectures as modular skills that can be imported individually or composed—many production agents use Mem0-style retrieval for facts, Letta-style blocks for persona management, and MemGPT-style citations for conversation history, all within the same codebase.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how rohitg00/ai-engineering-from-scratch works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →