internals

How PageAgentCore's Reflection-Before-Action Agent Loop Works Internally

March 9, 2026 alibaba/page-agent ↗

PageAgentCore implements a Re-act (Reflect-Think-Act) loop that forces the LLM to evaluate previous steps, maintain short-term memory, and define the next goal before executing any browser action, using a unified MacroTool that bundles reflection fields with tool selection.

The alibaba/page-agent repository provides a robust browser automation framework powered by a sophisticated agent architecture. At the heart of this system lies PageAgentCore, which implements a reflection-before-action pattern that ensures the LLM critically assesses its progress and updates its strategy before every browser interaction. This design prevents aimless action sequences and enables adaptive task completion in dynamic web environments.

The Four-Phase Execution Cycle

The execute method in packages/core/src/PageAgentCore.ts drives the main loop (lines 31‑44). Each iteration follows a strict sequence: observe the environment, assemble context, invoke the LLM with a structured MacroTool, and dispatch the resulting action.

1. Observation Collection

Before any reasoning occurs, the agent captures the current browser state. The #handleObservations method (lines 10‑46) collects DOM snapshots, navigation warnings, and custom observations, flushing them into an internal #observations buffer. These observations provide the factual grounding for the LLM's subsequent reflection.

2. Prompt Assembly and LLM Invocation

With observations gathered, the agent constructs the message payload:

const messages = [
  { role: 'system', content: this.#getSystemPrompt() },
  { role: 'user',   content: await this.#assembleUserPrompt() }
];

The LLM is then invoked with a forced tool call to the MacroTool (lines 54‑63). Unlike open-ended generation, this pattern requires the model to output a structured object containing both its internal reasoning and the selected action.

3. Reflection Extraction

The LLM response is parsed into a MacroToolResult containing the reflection data and action choice. The agent extracts three key reflection fields (lines 68‑74):

const reflection: Partial<AgentReflection> = {
  evaluation_previous_goal: input.evaluation_previous_goal,
  memory: input.memory,
  next_goal: input.next_goal,
};

Simultaneously, it identifies the concrete action to execute by extracting the first key from the action object (lines 73‑78):

const actionName = Object.keys(input.action)[0];
const action = { 
  name: actionName, 
  input: input.action[actionName], 
  output: output 
};

4. Action Execution and Loop Control

The step is recorded in this.history with full context, creating a persistent memory trail. If actionName equals 'done', the loop terminates and returns the final ExecutionResult (lines 99‑105). Otherwise, the step counter increments and the cycle repeats until the task is complete or the maximum step limit is reached.

MacroTool Schema Design

The reflection-before-action constraint is enforced structurally through the #packMacroTool method (lines 62‑77). This factory creates a Zod schema that the LLM must populate on every turn.

Reflection Fields Structure

The MacroTool's input schema requires three optional reflection fields plus a mandatory action (lines 68‑73):

evaluation_previous_goal – LLM's critique of whether the previous step succeeded
memory – Short-term notes to retain across steps
next_goal – Specific objective for the immediate next action
action – A discriminated union of all available tool schemas (click, input_text, wait, etc.)

This design forces the model to narrate its reasoning before accessing effectful browser tools.

Dynamic Action Union

The schema dynamically aggregates available tools using Zod unions:

const actionSchemas = Array.from(tools.entries()).map(([toolName, tool]) => {
  return z.object({ [toolName]: tool.inputSchema }).describe(tool.description);
});
const actionSchema = z.union(actionSchemas as [...]);

This ensures the LLM can only select from registered, valid actions while maintaining type safety.

Tool Execution and Event Emission

When the MacroTool's execute function runs, it performs the actual browser automation. The process (lines 84‑90) involves:

Tool Lookup – Retrieving the concrete implementation from the tools registry
Context Binding – Executing the tool with tool.execute.bind(this), granting access to this.pageController
Activity Emission – Firing executing and executed events for observability (lines 98‑106)

The tool's output is captured and returned as part of the MacroToolResult, which populates action.output in the history entry for the next iteration's context.

State Management and History Tracking

All observations collected during #handleObservations are flushed into this.history after each step (lines 38‑45). This history array serves as the agent's long-term memory, providing the LLM with a complete record of previous reflections, actions, and outputs when assembling subsequent prompts. The system also tracks remaining steps to prevent infinite loops, emitting warnings as the agent approaches its configured limit.

Implementation Example

To utilize the reflection-before-action loop in your own project:

import { PageAgentCore } from '@page-agent/core';
import { PageController } from '@page-agent/page-controller';

// Initialize the browser controller
const controller = new PageController({ 
  headless: false,
  viewport: { width: 1280, height: 720 }
});

// Configure the agent
const agent = new PageAgentCore({
  pageController: controller,
  maxSteps: 30,           // Safety limit for the loop
  language: 'en',         // Prompt localization
});

// Execute with a natural language goal
agent.execute('Find the pricing page and extract the Enterprise plan cost')
  .then((result) => {
    console.log(`Success: ${result.success}`);
    console.log(`Steps taken: ${result.history.length}`);
    result.history.forEach((step, i) => {
      console.log(`${i + 1}. ${step.reflection.next_goal} → ${step.action.name}`);
    });
  })
  .catch(console.error);

This instantiation creates a PageAgentCore instance bound to a PageController, ready to perform the reflection-before-action cycle until the task completes or the step limit is reached.

Summary

PageAgentCore implements a strict Re-act loop where reflection precedes every browser action
The MacroTool schema enforces structured output containing evaluation_previous_goal, memory, next_goal, and the selected action
Observations are collected via #handleObservations and persisted in a history array that provides long-term context
Actions are dispatched through a bound tool registry with full access to the PageController instance
The loop terminates when the LLM selects the done tool or when the maxSteps limit is exceeded

Frequently Asked Questions

What is the reflection-before-action pattern in PageAgentCore?

The reflection-before-action pattern requires the LLM to output a structured reflection containing an evaluation of the previous step, short-term memory notes, and the next goal before selecting any browser action. This is enforced by the MacroTool schema in packages/core/src/PageAgentCore.ts, ensuring the agent reasons about its progress rather than reacting blindly to the DOM.

How does the MacroTool enforce structured output from the LLM?

The MacroTool uses a Zod schema defined in #packMacroTool (lines 62‑77) that combines reflection fields with a discriminated union of available actions. When invoking the LLM, PageAgentCore passes this tool definition with tool_choice: 'required', forcing the model to return a JSON object matching the schema rather than freeform text.

What happens when the agent reaches the maximum step limit?

If the internal step counter exceeds the maxSteps configuration (default or user-provided), the agent halts execution and returns an ExecutionResult indicating failure due to step exhaustion. During execution, the system emits observations warning that the remaining steps are low, allowing the LLM to prioritize completing the task within the constraint.

How does PageAgentCore maintain context across multiple steps?

Context persistence is achieved through the this.history array, which stores every step's reflection data, action details, and output (lines 38‑45). When assembling the user prompt for subsequent iterations, the agent includes this history, giving the LLM full visibility into previous evaluations and actions taken.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how alibaba/page-agent works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →