How to Migrate from browser-use to PageAgent for Client-Side Automation

Migrate from browser-use to PageAgent by replacing your import with @page-agent/page-agent, instantiating PageAgent with the same configuration keys, and using the identical tool API that now delegates to the separated PageController layer.

The alibaba/page-agent repository provides PageAgent as a modern replacement for the legacy browser-use library, introducing a cleaner architecture that isolates LLM-driven logic from DOM manipulation. When you migrate from browser-use to PageAgent, you gain a modular core, an optional UI panel, and the same high-level tool API without breaking existing automation workflows.

Architecture Overview

PageAgent restructures browser-use into three distinct layers that communicate through async interfaces. This separation allows you to run headless automation or attach the UI panel as needed.

Core Layer (PageAgentCore)

The Core (@page-agent/core) manages the LLM-orchestrated loop, tool registration, and the high-level run API. In packages/core/src/tools/index.ts, you will find the tool definitions that forward requests to the Page-Controller layer. This layer handles the agent's decision-making without direct DOM manipulation.

Page-Controller Layer

The Page-Controller (@page-agent/page-controller) handles pure DOM extraction, indexing, and element actions. The PageController class in packages/page-controller/src/PageController.ts provides methods like clickElement(), inputText(), and scroll(). It indexes the page once via updateTree(), builds a simplified HTML string, and stores interactive elements in a selectorMap. All actions are async and index-based, identical to browser-use.

UI Panel Layer

The UI (@page-agent/ui) provides an optional floating panel that displays the LLM's plan, logs, and a stop button. The Panel class in packages/ui/src/panel/Panel.ts integrates with the core. When instantiating PageAgent from packages/page-agent/src/PageAgent.ts, the panel is created automatically and can be shown via agent.panel.show().

Step-by-Step Migration Guide

Follow these steps to convert your existing browser-use scripts to PageAgent while maintaining the same automation behavior.

1. Update Dependencies and Imports

Replace the browser-use package with the PageAgent monorepo packages. The main entry point is @page-agent/page-agent for full UI support, or @page-agent/core for headless operation.

npm install @page-agent/core @page-agent/page-controller @page-agent/ui

Update your import statements:

// Before
import BrowserAgent from 'browser-use'

// After - with UI panel
import { PageAgent } from '@page-agent/page-agent'

// Or headless only
import { PageAgentCore } from '@page-agent/core'

2. Instantiate the Agent with Existing Configuration

Pass the same configuration fields to the new constructor. The model, baseURL, apiKey, language, and optional enableMask parameters remain compatible.

const agent = new PageAgent({
  model: 'qwen3.5-plus',
  baseURL: 'https://my-llm-endpoint',
  apiKey: process.env.API_KEY,  // Never hard-code credentials
  language: 'en-US',
  enableMask: true,  // Optional visual mask during automation
})

// Optional: Show the UI panel
agent.panel.show()

3. Replace Direct DOM Access

Browser-use exposed internal tree objects directly. PageAgent encapsulates DOM state through getBrowserState(). Update any scripts that accessed window.browserUse or internal trees:

// Refresh the DOM tree before index-based actions
await agent.pageController.updateTree()

// Access current state instead of direct tree manipulation
const state = await agent.pageController.getBrowserState()
console.log(state.header)   // Title and page info
console.log(state.content)  // Simplified interactive HTML

4. Verify Tool API Compatibility

The tool names remain identical to browser-use. The tools/index.ts file registers methods like click_element_by_index, input_text, scroll, and execute_javascript that forward to PageController methods. No changes are required to your high-level tool calls, though internal implementations now route through packages/page-controller/src/actions.ts.

Practical Code Examples

These examples demonstrate equivalent implementations between the old and new libraries.

Basic Click and Input Flow

This pattern replaces direct browser-use element interactions with PageAgent's async controller methods.

import { PageAgent } from '@page-agent/page-agent'

const agent = new PageAgent({
  model: 'qwen3.5-plus',
  baseURL: 'https://my-llm.com',
  apiKey: process.env.API_KEY,
  language: 'en-US',
})

agent.panel.show()

// Required: Index the page before element operations
await agent.pageController.updateTree()

// Click first interactive element (index 0)
const clickResult = await agent.pageController.clickElement(0)
console.log(clickResult.message)

// Type into the next input field (index 1)
const inputResult = await agent.pageController.inputText(1, 'Hello PageAgent')
console.log(inputResult.message)

// Scroll down one page
await agent.pageController.scroll({ down: true, numPages: 1 })

High-Level LLM Loop

Replace browserUse.runPrompt() with agent.runPrompt(). The method internally uses the same registered tools mapped in packages/core/src/tools/index.ts.

import { PageAgent } from '@page-agent/page-agent'

const agent = new PageAgent({
  model: 'qwen3.5-plus',
  baseURL: 'https://my-llm.com',
  apiKey: process.env.API_KEY,
  language: 'en-US',
})

await agent.runPrompt(`
  Find the search box on the page, type "Page Agent", press Enter,
  then click the first result.
`)

Headless CI/CD Automation

For server-side testing without UI overhead, use PageAgentCore directly from packages/core/src.

import { PageAgentCore } from '@page-agent/core'

const core = new PageAgentCore({
  model: 'qwen3.5-plus',
  baseURL: 'https://my-llm.com',
  apiKey: process.env.API_KEY,
})

await core.runPrompt('Navigate to the login page and fill the credentials.')

Summary

  • Install the three packages (core, page-controller, ui) from the alibaba/page-agent monorepo.
  • Import PageAgent for UI-enabled automation or PageAgentCore for headless operation.
  • Initialize with the same configuration object (model, baseURL, apiKey, language).
  • Update DOM access patterns to use pageController.updateTree() and getBrowserState().
  • Preserve existing tool names (click_element_by_index, input_text, scroll, execute_javascript) as they map 1:1 to the new architecture.
  • Reference key source files including packages/page-controller/src/PageController.ts for DOM actions and packages/core/src/tools/index.ts for tool registration.

Frequently Asked Questions

Is the PageAgent API backward compatible with browser-use?

Yes. PageAgent deliberately maintains the same tool names (click_element_by_index, input_text, scroll, execute_javascript) and high-level methods like runPrompt(). The main differences are architectural: PageAgent separates the LLM core from DOM manipulation via PageController, whereas browser-use combined these concerns. You can migrate by changing imports and adding one updateTree() call before index-based actions.

Can I run PageAgent without the UI panel?

Absolutely. Import PageAgentCore from @page-agent/core instead of PageAgent from @page-agent/page-agent. This headless mode excludes the Panel class defined in packages/ui/src/panel/Panel.ts and is ideal for CI/CD pipelines or server-side automation where no visual feedback is required.

How do I access the DOM state in PageAgent?

Call await agent.pageController.getBrowserState() after invoking updateTree(). This returns an object containing header (page title and URL) and content (simplified interactive HTML). This replaces browser-use's direct tree object exposure with a cleaner, serializable state snapshot.

What configuration options are required for migration?

You must provide model, baseURL, and apiKey. Optionally, specify language (defaults to browser locale) and enableMask (boolean for visual element highlighting). These match browser-use's configuration schema. Refer to packages/page-agent/src/demo.ts for auto-initialization examples and advanced configuration parsing.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →