deep-dive

Understanding the Agent Loop Implementation from Scratch: A Minimal ReAct Architecture

June 8, 2026 rohitg00/ai-engineering-from-scratch ↗

The agent loop implementation in the rohitg00/ai-engineering-from-scratch repository is a dependency-free Python realization of the ReAct (Reason + Act) pattern, cycling through thought generation, tool execution, and observation recording until a termination condition is satisfied.

The agent loop implementation found in Phase 14, Lesson 01 of this curriculum demonstrates how modern AI agents function without relying on external frameworks. Built entirely with the Python standard library, this educational codebase reveals the fundamental control flow underlying production systems like Claude SDK and OpenAI Agents SDK.

The ReAct Control Flow

At its core, the agent loop implementation follows the ReAct pattern—a cyclic interplay between reasoning and acting. The loop maintains a complete transcript of every interaction in a history buffer, allowing the LLM to reference previous thoughts and observations when planning its next move.

The cycle proceeds as follows: the user provides a query, the LLM generates a thought explaining its strategy, emits an action (a tool call), receives an observation (the tool output), and then reasons again based on that new information. This continues until the LLM determines it has sufficient information to provide a final answer.

Initialization and Turn Management

The AgentLoop class initializes by appending the user query as a Turn(kind="user") to its history attribute—a list[Turn] that serves as the message buffer. Each subsequent iteration appends new turns representing thoughts, actions, and final answers, ensuring the LLM driver always has full context.

The Execution Cycle

The AgentLoop.run method implements the core control logic with a hard limit on iterations:

Iterate up to max_turns times
Call llm.respond(history) to get the next move
If the response contains kind="finish", record the final turn and return
Otherwise, record the thought, parse the action into a ToolCall, and dispatch it through the ToolRegistry
Record the observation returned by the tool as part of the action turn
Repeat until termination

Core Components in `main.py`

The implementation lives in phases/14-agent-engineering/01-the-agent-loop/code/main.py, which contains four primary components working in concert.

AgentLoop Class

The AgentLoop class acts as the orchestrator. It maintains the history list and enforces the max_turns safety limit to prevent infinite loops. The run method contains the actual while-loop that drives the ReAct cycle, checking stop conditions after each LLM response.

ToolRegistry and Dispatch

The ToolRegistry provides a mapping between tool names (strings the LLM outputs) and Python callables (actual functions). Its dispatch method executes the tool with provided arguments and returns an observation string. This design isolates tool execution from the control flow, enabling easy addition of new capabilities.

ToyLLM Driver

ToyLLM is a deterministic script that simulates an LLM for testing purposes. It implements the critical respond(history) interface contract, returning a dictionary with either:

kind="finish" and an answer, or
A thought string and an action dict containing the tool name and arguments

Because the driver is swappable, you can replace ToyLLM with production APIs (Anthropic, OpenAI) without modifying the AgentLoop logic.

Turn Data Structure

Each step in the conversation is captured as a Turn object with distinct kinds: user, thought, action, and final. The action turn specifically encapsulates both the tool call and its resulting observation, creating a complete audit trail.

Running the Implementation

You can exercise the agent loop implementation offline using the provided demo builder:

from phases_14_agent_engineering_01_the_agent_loop.code.main import build_demo_agent

# Build the demo agent (registers calculator and KV store tools)

agent = build_demo_agent()

# Ask a question that triggers a sequence of tool calls

final_answer = agent.run("What is 120 plus 15% tax, stored in kv?")

# Pretty-print the trace

from phases_14_agent_engineering_01_the_agent_loop.code.main import pretty_trace
pretty_trace(agent.history)

print("Final answer:", final_answer)

Expected trace output:


[00   user] What is 120 plus 15% tax, stored in kv?
[01  thought] store the base price
[02   action] kv_set({'key': 'base', 'value': '120'}) -> stored base
[03  thought] compute 15% tax
[04   action] calculator({'expr': '120 * 0.15'}) -> 18.0
[05  thought] store the tax
[06   action] kv_set({'key': 'tax', 'value': '18.0'}) -> stored tax
[07  thought] compute total
[08   action] calculator({'expr': '120 + 18.0'}) -> 138.0
[09  thought] confirm stored values
[10   action] kv_get({'key': 'base'}) -> 120
[11   final] the total including 15% tax is 138.0

Extending the Architecture

The minimalist design of this agent loop implementation enables several extension paths:

LLM Swapping: Replace ToyLLM with any client implementing the respond method to connect to Anthropic's Responses API, OpenAI's ChatCompletion, or local models.
Tool Addition: Register new functions with ToolRegistry to give the agent capabilities like web search, file I/O, or database queries.
Parallel Execution: Modify AgentLoop.run to handle multiple tool calls per turn by dispatching actions concurrently and aggregating observations.
Observability: The history list serves as a complete replayable transcript compatible with OpenTelemetry, Langfuse, or Phoenix tracing systems.

Summary

The agent loop implementation uses the ReAct pattern (Reason + Act) to cycle between LLM reasoning and tool execution until a stop condition is met.
Located in phases/14-agent-engineering/01-the-agent-loop/code/main.py, the AgentLoop class manages state through a history list of Turn objects and enforces safety via max_turns.
The ToolRegistry provides dependency injection for capabilities, while ToyLLM demonstrates the minimal interface contract required for the LLM driver.
This pure-Python design offers complete transparency into agent control flow, making it ideal for debugging, education, and as a foundation for production SDKs.

Frequently Asked Questions

What is the ReAct pattern in the context of this agent loop?

The ReAct pattern (Reason + Act) structures agent behavior as an alternating sequence of internal reasoning and external tool usage. In this implementation, the LLM generates a thought turn explaining its strategy, followed by an action turn containing a tool call. After observing the tool's output, the loop feeds that observation back to the LLM as context for the next reasoning step, creating a closed feedback cycle that continues until the problem is solved.

How does the implementation prevent infinite execution?

The AgentLoop class enforces a turn budget through its max_turns parameter. Each iteration of the run method increments a counter; if the LLM fails to emit a finish turn before reaching this limit, the loop terminates with a "budget exhausted" status rather than running indefinitely. This safety mechanism is essential for production deployments where unbounded loops could consume excessive tokens or compute resources.

Can I replace the ToyLLM with a production LLM API?

Yes. The ToyLLM class implements a simple respond(history) interface that returns a dictionary conforming to a specific contract: either {"kind": "finish", "answer": ...} or a structure containing thought and action keys. Any production client—whether for Anthropic, OpenAI, or local inference—can replace ToyLLM by implementing this same interface, allowing the AgentLoop control flow to remain completely unchanged when scaling from educational demos to production systems.

Where are the tool definitions stored and how are they executed?

Tools are registered in the ToolRegistry class, which maintains a mapping from string names to Python callables. When the LLM outputs an action (e.g., calculator), the AgentLoop constructs a ToolCall object and passes it to ToolRegistry.dispatch, which looks up the function, executes it with the provided arguments, and returns the result as an observation string. This dispatch mechanism isolates tool execution logic from the agent's control flow, enabling safe sandboxing and error handling.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how rohitg00/ai-engineering-from-scratch works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →