How to Implement Custom Tools to Extend PageAgent Capabilities
You can extend PageAgent capabilities by defining a PageAgentTool with a Zod schema and passing it in the customTools configuration object when instantiating the agent, which merges your definitions into the internal tool map during construction.
The alibaba/page-agent framework executes browser automation through a modular tool system that the LLM invokes via a reflection-before-action model. To implement custom tools that extend PageAgent capabilities beyond built-in actions like click_element_by_index or execute_javascript, you leverage the customTools configuration API. This approach allows you to add domain-specific functions, override existing behaviors, or remove unnecessary tools without modifying the core agent loop in PageAgentCore.ts.
Understanding PageAgent's Tool Architecture
The Tool Interface
Every tool in PageAgent conforms to the PageAgentTool<TParams> interface defined in packages/core/src/tools/index.ts (lines 13-18). This structure requires three properties:
- description: A natural language explanation of what the tool does
- inputSchema: A Zod schema that validates and types the tool's parameters
- execute: An async method bound to the
PageAgentinstance that performs the actual work
The tool helper function exported from page-agent (re-exported from core) provides a type-safe way to create these definitions.
Tool Registration and the Core Map
Inside packages/core/src/PageAgentCore.ts, the constructor initializes a Map<string, PageAgentTool> called tools (lines 28-30) that holds all available actions. During initialization, the agent clones this map and merges any customTools passed via the configuration. The merging logic at lines 31-40 handles both additions and removals:
// packages/core/src/PageAgentCore.ts
if (this.config.customTools) {
for (const [name, tool] of Object.entries(this.config.customTools)) {
if (tool === null) {
this.tools.delete(name) // remove built‑in tool
continue
}
this.tools.set(name, tool) // add or override tool
}
}
Creating a Custom Tool Definition
To implement a custom tool, import the tool helper and define your function using a Zod schema for input validation. The execute method receives the validated parameters and has access to the agent instance via this.
// src/customTools.ts
import { tool } from 'page-agent'
import { z } from 'zod/v4'
import type { PageAgent } from 'page-agent'
export const customTools = {
fetch_and_summarize: tool({
description:
'Fetch a JSON endpoint and return a brief summary of the data.',
inputSchema: z.object({
url: z.string().url(),
maxItems: z.number().int().min(1).max(20).default(5),
}),
// `this` is bound to the PageAgent instance
async execute(this: PageAgent, { url, maxItems }) {
const response = await fetch(url)
const data = await response.json()
const items = Array.isArray(data) ? data.slice(0, maxItems) : [data]
return `✅ Fetched ${items.length} item(s) from ${url}.`
},
}),
}
This definition follows the PageAgentTool type from packages/core/src/tools/index.ts and satisfies the customTools property declared in packages/core/src/types.ts (lines 20-49).
Registering Custom Tools via Configuration
Pass your custom tool definitions through the AgentConfig interface when constructing a PageAgent or PageAgentCore instance. The high-level PageAgent class in packages/page-agent/src/PageAgent.ts extends PageAgentCore and accepts the same configuration options.
// src/runAgent.ts
import { PageAgent } from 'page-agent'
import { customTools } from './customTools'
const agent = new PageAgent({
llmConfig: {
model: 'gpt-4o-mini',
apiKey: process.env.OPENAI_API_KEY!,
},
// Register the custom tools
customTools,
// Optional: enable experimental features if needed
experimentalScriptExecutionTool: true,
})
// Execute a task that uses the new tool
await agent.execute(`
Please fetch the latest 3 posts from https://jsonplaceholder.typicode.com/posts
and summarize their titles.
`)
Removing or Overriding Built-in Tools
The customTools object supports two additional operations beyond adding new capabilities. To remove a built-in tool, assign null to its name. To override a built-in tool, provide a new definition using the same key.
export const customTools = {
// Add new capability
fetch_and_summarize: tool({ /* ... */ }),
// Remove built-in ask_user tool
ask_user: null,
// Override the default wait behavior
wait: tool({
description: 'Wait for a specified duration with custom logging',
inputSchema: z.object({ seconds: z.number() }),
async execute(this: PageAgent, { seconds }) {
console.log(`Custom wait: ${seconds}s`)
await new Promise(r => setTimeout(r, seconds * 1000))
return `Waited ${seconds} seconds`
},
}),
}
Runtime Execution and LLM Integration
During each step, the LLM calls a macro-tool (AgentOutput) that selects an action from the final merged tools map. The #packMacroTool method (lines 80-86 in PageAgentCore.ts) extracts the action.toolName and dispatches to your concrete tool's execute method.
When the LLM decides to use your custom tool, it outputs JSON like:
{
"action": {
"fetch_and_summarize": {
"url": "https://jsonplaceholder.typicode.com/posts",
"maxItems": 3
}
},
"evaluation_previous_goal": "Previous step completed",
"memory": "Need to fetch posts",
"next_goal": "Analyze the fetched data"
}
The core extracts fetch_and_summarize, validates inputs against your Zod schema, and executes your method, returning the string result to the LLM context.
Summary
- Tool Definition: Create
PageAgentToolobjects using thetoolhelper with Zod schemas inpackages/core/src/tools/index.ts - Configuration: Pass tools via
customToolsinAgentConfigwhen instantiatingPageAgentorPageAgentCore - Merging Logic: The constructor in
packages/core/src/PageAgentCore.ts(lines 31-40) clones the internal map and merges your definitions, supporting addition, override, and deletion vianull - Execution: Custom tools are immediately available to the LLM through the
AgentOutputmacro-tool dispatch system - Context Access: Tool
executemethods are bound to the agent instance, providing access to the browser page and agent state viathis
Frequently Asked Questions
Can I override existing built-in tools like click_element_by_index?
Yes. If you provide a tool definition in customTools using the same name as a built-in tool, your definition will replace the original in the internal tools Map. This occurs during the merge loop in PageAgentCore.ts where this.tools.set(name, tool) overwrites any existing entry.
What schema validation library does PageAgent require for tool inputs?
PageAgent uses Zod (specifically version 4 as zod/v4) for input validation. The inputSchema property of PageAgentTool expects a Zod schema object. The agent validates LLM outputs against this schema before passing them to your execute method.
Do custom tools have access to the browser page and agent state?
Yes. The execute function is bound to the PageAgent instance at runtime, so this refers to the agent itself. You can access the underlying Playwright page via this.page (or equivalent properties on the agent instance) to perform browser operations within your custom tool logic.
How do I remove a tool so the LLM cannot use it?
Set the tool name to null in your customTools configuration object. During construction, PageAgentCore checks for null values and calls this.tools.delete(name), effectively removing that capability from the agent's available actions before any task execution begins.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →