how-to-guide

How to Implement Agent Memory and Planning Patterns in Python

May 21, 2026 rohitg00/ai-engineering-from-scratch ↗

Agent memory uses a two-tier virtual-context system (main context + archival store) with tool-based paging, while planning combines symbolic HTN decomposition with LLM fallback and evolutionary search for optimization.

The rohitg00/ai-engineering-from-scratch curriculum provides production-grade implementations of these foundational architectural patterns. This guide covers the virtual-context memory model derived from MemGPT and the hybrid planning approach that merges Hierarchical Task Networks with evolutionary algorithms.

Virtual-Context Memory Architecture (MemGPT Pattern)

The Two-Tier Memory Model

The repository implements a virtual-context memory system in phases/14-agent-engineering/07-memory-virtual-context-memgpt/code/main.py that mirrors operating-system virtual memory. It separates working memory from long-term storage:

Main context: A fixed-size prompt buffer (default 2,000 tokens) containing a core dictionary for persistent sections (e.g., "User preferences") and a FIFO messages list for recent dialogue history.
External (archival) store: An unbounded BM25-like index holding records with (id, text, tags, session, turn) metadata.

When the agent needs information beyond the main context, it triggers a page fault—the runtime retrieves data from archival storage and splices it into the next LLM turn as a system observation.

Memory Tools and the Interrupt Pattern

The MemoryTools class exposes functions that the agent invokes as ReAct loop actions. These tools are defined in phases/14-agent-engineering/07-memory-virtual-context-memgpt/code/main.py:

core_memory_append(section, text) – Appends text to a named section in main context.
core_memory_replace(section, old, new) – Atomic replacement within core memory.
archival_memory_insert(text) – Stores text in archival store with auto-generated metadata.
archival_memory_search(query, top_k) – BM25 retrieval returning ranked results.
conversation_search(query) – Searches historical message threads.

When the agent calls an archival tool, the runtime treats this as an interrupt, pauses generation, executes the retrieval, and prepends the result to the next observation. This pattern allows unbounded memory without overwhelming the context window.

from memory.main import MainContext, ArchivalStore, MemoryTools

# Initialise a 2k-token main buffer and in-memory BM25 store

main = MainContext(max_tokens=2000)
archive = ArchivalStore()
tools = MemoryTools(main, archive)

# Agent writes a fact to archival memory

tools.archival_memory_insert(
    "Alice prefers espresso over latte", 
    tags=["prefs"]
)

# Later the agent retrieves the fact

results = tools.archival_memory_search("Alice coffee preference", top_k=5)
print(results[0].text)   # → "Alice prefers espresso over latte"

Handling Memory Lifecycle and Security

Production deployments must guard against memory rot and memory poisoning. The curriculum enforces three lifecycle policies:

Periodic consolidation: Summarizing aged core entries to compress history.
Invalidation: Marking stale archival records when contradicted by new facts.
Citation tracking: Every archival_memory_insert attaches session_id, turn_id, and source_url metadata, enabling automatic source citation and preventing prompt injection via retrieved content.

Extensions in Lesson 08 (Letta) and Lesson 09 (Mem0) add vector, KV, and graph stores as a third "recall" tier, but the underlying virtual-context pattern remains identical.

Hybrid Planning with HTN and Evolutionary Search

Hierarchical Task Networks (HTN) Core

The symbolic planner in phases/14-agent-engineering/11-planning-htn-and-evolutionary/code/main.py uses HTN decomposition to guarantee provably correct execution plans. The architecture distinguishes:

Compound tasks: Abstract goals requiring decomposition.
Primitive tasks: Executable operators with preconditions and effects.
Methods: Mappings from compound tasks to subtask sequences, guarded by preconditions.
Operators: Encodings of primitive actions with preconditions and effects lists.

from htn.core import HTNPlanner, Method, Operator

# Define a primitive operator

send_op = Operator(
    name="send_email",
    preconditions=["has_smtp"],
    effects=["email_sent"],
    fn=lambda to, subject, body: f"Email sent to {to}",
)

# Define a method for compound task "notify_user"

notify_method = Method(
    name="notify_user_via_email",
    task="notify_user",
    preconditions=["user_has_email"],
    subtasks=[
        ("check_smtp", []),
        ("send_email", ["user_email", "subject", "body"]),
    ],
)

planner = HTNPlanner(methods=[notify_method], operators=[send_op])

Chat-HTN Fallback for Unknown Tasks

When no existing method matches a compound task, the planner invokes llm_decompose(task, state)—a tool call that prompts an LLM to suggest subtasks. The symbolic layer validates these candidates against the operator schema, ensuring that every final plan is sound even when the LLM proposes novel decompositions.

def plan_task(task, state):
    plan = planner.generate(task, state)
    if not plan:
        # LLM fallback with symbolic validation

        plan = llm_decompose(task, state)   # returns validated primitive steps

    return plan

AlphaEvolve for Optimization Problems

For tasks with deterministic fitness functions (code optimization, scheduling, numeric approximation), the repository implements AlphaEvolve in phases/14-agent-engineering/11-planning-htn-and-evolutionary/code/main.py. An ensemble of LLMs proposes program mutations, but selection is purely algorithmic based on the fitness score:

from evolve.core import EvolutionarySearcher, fitness_mean_squared_error

searcher = EvolutionarySearcher(
    seed_program="lambda x: x * 2",
    fitness_fn=lambda prog: fitness_mean_squared_error(prog, target=lambda x: x**2),
    mutation_rate=0.2,
    population_size=30,
)

best = searcher.run(generations=50)
print(best)   # → a program approximating x**2

This hybrid approach uses HTN for policy-driven logic and evolutionary search for performance-driven optimization.

Production Integration: Combining Memory and Planning

Production agents typically compose both patterns within the same ReAct loop. The following scaffold from the repository demonstrates how memory retrieval interrupts and planning tools coexist:


# Pseudocode outline (real code in the repo's skill scaffolds)

agent = AgentLoop(
    memory=VirtualContextMemory(max_main_tokens=2000),
    planner=HybridPlanner(
        htn=HTNPlanner(methods=method_lib),
        evolutionary=EvolutionarySearcher(fitness=my_fitness)
    )
)

while True:
    user_input = get_user_message()
    agent.observe(user_input)

    # 1️⃣ Memory check – does the user ask about a past fact?

    if needs_memory(user_input):
        facts = agent.memory.search(user_input)
        agent.inject_observation(facts)

    # 2️⃣ Planning – does the request require a multi-step plan?

    if needs_plan(user_input):
        plan = agent.planner.generate(user_input)
        agent.execute_plan(plan)

    # 3️⃣ Normal answer – fall back to LLM if no special handling

    response = agent.respond()
    send_back(response)

Ready-to-deploy skill descriptions are available in outputs/skill-virtual-memory.md and outputs/skill-hybrid-planner.md, which generate framework-specific implementations for Claude Agent SDK, OpenAI Assistants, or LangGraph.

Summary

Virtual-context memory splits state into a bounded main context (prompt) and unbounded archival store, using tool calls ("interrupts") to page data in and out.
Memory tools (core_memory_append, archival_memory_search, etc.) execute as ReAct actions, with the runtime handling retrieval and injection automatically.
HTN planning provides symbolic task decomposition with preconditions and effects, guaranteeing valid execution sequences.
Chat-HTN fills gaps in the method library by using LLM suggestions validated against the operator schema.
AlphaEvolve applies evolutionary search to optimization problems, using a deterministic fitness function to select from LLM-generated mutations.
Production scaffolds in outputs/skill-virtual-memory.md and outputs/skill-hybrid-planner.md port these patterns to any agent framework.

Frequently Asked Questions

What is the difference between main context and archival memory in agent systems?

Main context is a fixed-size token buffer that the LLM always sees, containing a core dictionary for persistent facts and a FIFO messages list for recent dialogue. Archival memory is an unbounded external store (BM25 or vector index) holding historical records that must be retrieved via archival_memory_search and injected into the main context before the LLM can access them.

How does the HTN planner handle tasks it has never seen before?

When no method exists for a compound task, the planner calls llm_decompose(task, state) as a fallback. The LLM returns candidate subtasks, which the symbolic layer validates against existing operator schemas and preconditions. Only syntactically valid and logically sound plans are accepted, ensuring the agent never attempts undefined actions.

Can the evolutionary search patterns be used with neural network weights or only code?

The AlphaEvolve implementation in phases/14-agent-engineering/11-planning-htn-and-evolutionary/code/main.py uses a generic fitness_fn interface, allowing any evaluable program representation. While the examples mutate Python ASTs (code), the pattern supports neural architecture search or hyperparameter optimization by encoding weights or configurations as the "program" and using validation loss as the fitness metric.

Where can I find production-ready scaffolding for these patterns?

The repository provides framework-agnostic skill generators in outputs/skill-virtual-memory.md (for memory) and outputs/skill-hybrid-planner.md (for planning). These files contain templating logic that produces ready-to-run implementations for Claude Agent SDK, OpenAI Assistants, LangGraph, or custom ReAct loops, automatically wiring the tool definitions and state management.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how rohitg00/ai-engineering-from-scratch works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →