How the Page-Agent Chrome Extension Handles Multi-Page and Multi-Tab Coordination
The Page-Agent Chrome extension enables autonomous agents to work across multiple browser tabs by coordinating three core components—TabsController for tab lifecycle management, RemotePageController for cross-tab DOM manipulation, and MultiPageAgent for orchestration—all communicating through a service-worker using TAB_CONTROL and PAGE_CONTROL messages while sharing state via chrome.storage.local.
The alibaba/page-agent repository provides a Chrome extension architecture that allows AI agents to seamlessly operate across multiple web pages and browser tabs. By leveraging Chrome's extension APIs and a message-passing architecture, the extension solves the complex challenge of multi-page and multi-tab coordination, enabling autonomous workflows that span numerous websites simultaneously.
The Three Core Components of Multi-Page Coordination
The extension wires together three specialized controllers to manage distributed browser state.
TabsController: Managing Tab Lifecycle and Groups
Located in packages/extension/src/agent/TabsController.ts, the TabsController manages the Chrome tab lifecycle and maintains shared coordination state. It creates a dedicated PageAgent tab group with a random accent color via randomColor, grouping all agent-managed tabs visually for the user.
The controller stores critical state in chrome.storage.local, including currentTabId, the managed tab list, and tab summaries. All tab operations—open, switch, close, and update—execute by sending TAB_CONTROL messages from the agent to the service-worker background script, which then invokes the Chrome tab APIs. The background message handler lives in packages/extension/src/agent/TabsController.background.ts.
RemotePageController: Cross-Tab DOM Manipulation
The RemotePageController (packages/extension/src/agent/RemotePageController.ts) provides the same DOM-manipulation API that PageAgentCore expects, but forwards each request to a content script running in the target tab. Before routing, it validates that the target URL permits content scripts using isContentScriptAllowed, blocking restricted schemes like chrome://, chrome-extension://, and file://.
When the agent calls methods like clickElement or getBrowserState, the controller sends a PAGE_CONTROL message to the background script (packages/extension/src/agent/RemotePageController.background.ts), which routes the request to the correct tab's content script. The content script performs the actual DOM work and returns the result to the agent.
MultiPageAgent: The Orchestration Layer
The MultiPageAgent class in packages/extension/src/agent/MultiPageAgent.ts extends PageAgentCore and integrates the above controllers. During onBeforeTask, it initializes TabsController (creating the tab group and optionally including the active tab), starts a heartbeat stored in chrome.storage.local, and sets isAgentRunning to true.
The onBeforeStep lifecycle hook guarantees the current tab is fully loaded before each LLM-driven step. When the agent needs to interact with a page, it invokes methods on RemotePageController (accessed via agent.pageController), which are transparently routed through the background service-worker to the appropriate content script.
Message-Based Coordination Flow
The coordination relies on two distinct message types processed by the service-worker in packages/extension/src/entrypoints/background.ts:
- TAB_CONTROL: Handles tab lifecycle operations (open, switch, close, group management)
- PAGE_CONTROL: Routes DOM manipulation requests to specific tab content scripts
The coordination flow follows these steps:
- Agent calls
tabsController.init(task)→ sendsTAB_CONTROL→get_active_tab - Background SW returns active tab ID, creates a new tab group via
create_tab_group, and stores the group ID - Agent calls
openNewTab(url)→ sendsTAB_CONTROL→open_new_tab - Background SW opens the Chrome tab, adds it to the PageAgent group, and resolves with
tabId - Agent calls
remotePageController.getBrowserState()→ sendsPAGE_CONTROL→get_browser_statewithtargetTabId - Background SW routes the
PAGE_CONTROLmessage to the content script of the target tab - Content script extracts the DOM tree and returns
BrowserState - Agent uses the returned data to drive LLM decisions, calling
switchToTaborclickElementas needed - Heartbeat writes
{agentHeartbeat: Date.now()}tochrome.storage.localevery second so the side-panel can detect agent health
Because all state lives in chrome.storage.local and all communication routes through the service-worker, any extension component (side-panel, content script, or background) can read the current tab ID, managed tab list, and agent status, ensuring consistent coordination even if the user switches windows.
Key Implementation Details
Visual Tab Grouping for User Transparency
The TabsController.openNewTab method creates a Chrome tab group named PageAgent(${task}) with a randomly selected accent color. All subsequent tabs join this same group, making it easy for users to visually distinguish the agent's workspace from regular browsing tabs.
Content Script Security Gating
Before injecting scripts, RemotePageController.isContentScriptAllowed validates URLs to prevent injection into restricted Chrome pages. This security check blocks chrome://, chrome-extension://, and file:// schemes, ensuring the agent only operates on standard web pages where content scripts are permitted.
Global Context for LLM Decision Making
The TabsController.summarizeTabs method constructs a markdown table containing tab IDs, URLs, titles, and active status. This table prepends BrowserState.header, providing the LLM with a comprehensive global view of all open pages to inform multi-tab decisions.
Lifecycle Management and Heartbeat Monitoring
MultiPageAgent registers lifecycle hooks (onBeforeTask, onAfterTask, onBeforeStep, onDispose) to manage coordination state. The onBeforeStep hook guarantees the current tab is fully loaded before executing DOM operations. A heartbeat mechanism writes timestamps to chrome.storage.local every second, enabling the side-panel to detect agent crashes by checking isAgentRunning and agentHeartbeat status.
Practical Implementation Examples
Creating a multi-page agent:
import { MultiPageAgent } from '@/agent/MultiPageAgent'
const agent = new MultiPageAgent({
llmConfig: { model: 'gpt-4o', apiKey: 'YOUR_KEY' },
includeInitialTab: true,
})
await agent.runTask(`Search for the latest JavaScript frameworks, open a new tab for each, and summarize the findings.`)
Opening a new tab programmatically:
await agent.pageController.openNewTab('https://developer.mozilla.org/en-US/docs/Web/JavaScript')
Switching between managed tabs:
await agent.tabsController.switchToTab(5)
Retrieving global browser state with tab summaries:
const state = await agent.pageController.getBrowserState()
// state.header contains a markdown table of all managed tabs
Monitoring agent health from the side-panel UI:
setInterval(async () => {
const { isAgentRunning, agentHeartbeat } = await chrome.storage.local.get(['isAgentRunning', 'agentHeartbeat'])
if (!isAgentRunning || Date.now() - agentHeartbeat > 5000) {
console.log('Agent not active')
}
}, 2000)
Summary
- Three-component architecture:
MultiPageAgentorchestratesTabsController(tab lifecycle) andRemotePageController(DOM operations) to enable seamless multi-page and multi-tab coordination. - Message-passing protocol:
TAB_CONTROLmessages manage tabs via the background service-worker, whilePAGE_CONTROLmessages route DOM requests to specific content scripts in target tabs. - Shared state storage: All coordination state persists in
chrome.storage.local, includingcurrentTabId, managed tab lists, and heartbeat timestamps, ensuring consistency across extension components. - Security and grouping: The extension validates URLs before script injection via
isContentScriptAllowedand visually groups managed tabs with random accent colors for user clarity. - LLM-aware context: Tab summaries provide the agent with global context across all open pages through
BrowserState.header, enabling informed multi-tab decision making.
Frequently Asked Questions
How does Page-Agent prevent the agent from accessing restricted Chrome pages?
The RemotePageController class implements isContentScriptAllowed to block URLs matching patterns like chrome://, chrome-extension://, and file://. This validation occurs in packages/extension/src/agent/RemotePageController.ts before any PAGE_CONTROL message routes DOM manipulation requests to the target tab, ensuring the extension only operates on standard web pages.
What happens if a user manually closes a tab while the agent is running?
Because TabsController maintains the canonical tab list in chrome.storage.local and validates tab IDs before operations, the agent detects missing tabs during the next coordination cycle. The heartbeat mechanism and onBeforeStep lifecycle hooks ensure the agent checks tab existence before attempting DOM interactions, allowing graceful handling of manual tab closures without crashing the agent loop.
Can multiple Page-Agent instances run simultaneously in different windows?
The current architecture uses shared state in chrome.storage.local keyed by specific agent identifiers. While the extension supports multiple tabs within one agent instance, running completely separate agent instances simultaneously would require careful management of storage keys to prevent state collisions between different agent sessions accessing the same chrome.storage.local namespace.
How does the agent decide which tab to interact with next?
The MultiPageAgent prepends a markdown summary table of all managed tabs to the BrowserState.header field via TabsController.summarizeTabs. This gives the LLM visibility into every open tab's URL, title, and ID, allowing the language model to explicitly specify tab switches through tabsController.switchToTab() or target specific tabs via remotePageController method calls based on the task requirements.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →