How the Page-Agent Chrome Extension Handles Multi-Page and Multi-Tab Coordination

The Page-Agent Chrome extension enables autonomous agents to work across multiple browser tabs by coordinating three core components—TabsController for tab lifecycle management, RemotePageController for cross-tab DOM manipulation, and MultiPageAgent for orchestration—all communicating through a service-worker using TAB_CONTROL and PAGE_CONTROL messages while sharing state via chrome.storage.local.

The alibaba/page-agent repository provides a Chrome extension architecture that allows AI agents to seamlessly operate across multiple web pages and browser tabs. By leveraging Chrome's extension APIs and a message-passing architecture, the extension solves the complex challenge of multi-page and multi-tab coordination, enabling autonomous workflows that span numerous websites simultaneously.

The Three Core Components of Multi-Page Coordination

The extension wires together three specialized controllers to manage distributed browser state.

TabsController: Managing Tab Lifecycle and Groups

Located in packages/extension/src/agent/TabsController.ts, the TabsController manages the Chrome tab lifecycle and maintains shared coordination state. It creates a dedicated PageAgent tab group with a random accent color via randomColor, grouping all agent-managed tabs visually for the user.

The controller stores critical state in chrome.storage.local, including currentTabId, the managed tab list, and tab summaries. All tab operations—open, switch, close, and update—execute by sending TAB_CONTROL messages from the agent to the service-worker background script, which then invokes the Chrome tab APIs. The background message handler lives in packages/extension/src/agent/TabsController.background.ts.

RemotePageController: Cross-Tab DOM Manipulation

The RemotePageController (packages/extension/src/agent/RemotePageController.ts) provides the same DOM-manipulation API that PageAgentCore expects, but forwards each request to a content script running in the target tab. Before routing, it validates that the target URL permits content scripts using isContentScriptAllowed, blocking restricted schemes like chrome://, chrome-extension://, and file://.

When the agent calls methods like clickElement or getBrowserState, the controller sends a PAGE_CONTROL message to the background script (packages/extension/src/agent/RemotePageController.background.ts), which routes the request to the correct tab's content script. The content script performs the actual DOM work and returns the result to the agent.

MultiPageAgent: The Orchestration Layer

The MultiPageAgent class in packages/extension/src/agent/MultiPageAgent.ts extends PageAgentCore and integrates the above controllers. During onBeforeTask, it initializes TabsController (creating the tab group and optionally including the active tab), starts a heartbeat stored in chrome.storage.local, and sets isAgentRunning to true.

The onBeforeStep lifecycle hook guarantees the current tab is fully loaded before each LLM-driven step. When the agent needs to interact with a page, it invokes methods on RemotePageController (accessed via agent.pageController), which are transparently routed through the background service-worker to the appropriate content script.

Message-Based Coordination Flow

The coordination relies on two distinct message types processed by the service-worker in packages/extension/src/entrypoints/background.ts:

  • TAB_CONTROL: Handles tab lifecycle operations (open, switch, close, group management)
  • PAGE_CONTROL: Routes DOM manipulation requests to specific tab content scripts

The coordination flow follows these steps:

  1. Agent calls tabsController.init(task) → sends TAB_CONTROLget_active_tab
  2. Background SW returns active tab ID, creates a new tab group via create_tab_group, and stores the group ID
  3. Agent calls openNewTab(url) → sends TAB_CONTROLopen_new_tab
  4. Background SW opens the Chrome tab, adds it to the PageAgent group, and resolves with tabId
  5. Agent calls remotePageController.getBrowserState() → sends PAGE_CONTROLget_browser_state with targetTabId
  6. Background SW routes the PAGE_CONTROL message to the content script of the target tab
  7. Content script extracts the DOM tree and returns BrowserState
  8. Agent uses the returned data to drive LLM decisions, calling switchToTab or clickElement as needed
  9. Heartbeat writes {agentHeartbeat: Date.now()} to chrome.storage.local every second so the side-panel can detect agent health

Because all state lives in chrome.storage.local and all communication routes through the service-worker, any extension component (side-panel, content script, or background) can read the current tab ID, managed tab list, and agent status, ensuring consistent coordination even if the user switches windows.

Key Implementation Details

Visual Tab Grouping for User Transparency

The TabsController.openNewTab method creates a Chrome tab group named PageAgent(${task}) with a randomly selected accent color. All subsequent tabs join this same group, making it easy for users to visually distinguish the agent's workspace from regular browsing tabs.

Content Script Security Gating

Before injecting scripts, RemotePageController.isContentScriptAllowed validates URLs to prevent injection into restricted Chrome pages. This security check blocks chrome://, chrome-extension://, and file:// schemes, ensuring the agent only operates on standard web pages where content scripts are permitted.

Global Context for LLM Decision Making

The TabsController.summarizeTabs method constructs a markdown table containing tab IDs, URLs, titles, and active status. This table prepends BrowserState.header, providing the LLM with a comprehensive global view of all open pages to inform multi-tab decisions.

Lifecycle Management and Heartbeat Monitoring

MultiPageAgent registers lifecycle hooks (onBeforeTask, onAfterTask, onBeforeStep, onDispose) to manage coordination state. The onBeforeStep hook guarantees the current tab is fully loaded before executing DOM operations. A heartbeat mechanism writes timestamps to chrome.storage.local every second, enabling the side-panel to detect agent crashes by checking isAgentRunning and agentHeartbeat status.

Practical Implementation Examples

Creating a multi-page agent:

import { MultiPageAgent } from '@/agent/MultiPageAgent'

const agent = new MultiPageAgent({
  llmConfig: { model: 'gpt-4o', apiKey: 'YOUR_KEY' },
  includeInitialTab: true,
})

await agent.runTask(`Search for the latest JavaScript frameworks, open a new tab for each, and summarize the findings.`)

Opening a new tab programmatically:

await agent.pageController.openNewTab('https://developer.mozilla.org/en-US/docs/Web/JavaScript')

Switching between managed tabs:

await agent.tabsController.switchToTab(5)

Retrieving global browser state with tab summaries:

const state = await agent.pageController.getBrowserState()
// state.header contains a markdown table of all managed tabs

Monitoring agent health from the side-panel UI:

setInterval(async () => {
  const { isAgentRunning, agentHeartbeat } = await chrome.storage.local.get(['isAgentRunning', 'agentHeartbeat'])
  if (!isAgentRunning || Date.now() - agentHeartbeat > 5000) {
    console.log('Agent not active')
  }
}, 2000)

Summary

  • Three-component architecture: MultiPageAgent orchestrates TabsController (tab lifecycle) and RemotePageController (DOM operations) to enable seamless multi-page and multi-tab coordination.
  • Message-passing protocol: TAB_CONTROL messages manage tabs via the background service-worker, while PAGE_CONTROL messages route DOM requests to specific content scripts in target tabs.
  • Shared state storage: All coordination state persists in chrome.storage.local, including currentTabId, managed tab lists, and heartbeat timestamps, ensuring consistency across extension components.
  • Security and grouping: The extension validates URLs before script injection via isContentScriptAllowed and visually groups managed tabs with random accent colors for user clarity.
  • LLM-aware context: Tab summaries provide the agent with global context across all open pages through BrowserState.header, enabling informed multi-tab decision making.

Frequently Asked Questions

How does Page-Agent prevent the agent from accessing restricted Chrome pages?

The RemotePageController class implements isContentScriptAllowed to block URLs matching patterns like chrome://, chrome-extension://, and file://. This validation occurs in packages/extension/src/agent/RemotePageController.ts before any PAGE_CONTROL message routes DOM manipulation requests to the target tab, ensuring the extension only operates on standard web pages.

What happens if a user manually closes a tab while the agent is running?

Because TabsController maintains the canonical tab list in chrome.storage.local and validates tab IDs before operations, the agent detects missing tabs during the next coordination cycle. The heartbeat mechanism and onBeforeStep lifecycle hooks ensure the agent checks tab existence before attempting DOM interactions, allowing graceful handling of manual tab closures without crashing the agent loop.

Can multiple Page-Agent instances run simultaneously in different windows?

The current architecture uses shared state in chrome.storage.local keyed by specific agent identifiers. While the extension supports multiple tabs within one agent instance, running completely separate agent instances simultaneously would require careful management of storage keys to prevent state collisions between different agent sessions accessing the same chrome.storage.local namespace.

How does the agent decide which tab to interact with next?

The MultiPageAgent prepends a markdown summary table of all managed tabs to the BrowserState.header field via TabsController.summarizeTabs. This gives the LLM visibility into every open tab's URL, title, and ID, allowing the language model to explicitly specify tab switches through tabsController.switchToTab() or target specific tabs via remotePageController method calls based on the task requirements.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →