# How the Page-Agent Chrome Extension Handles Multi-Page and Multi-Tab Coordination

> Discover how the Page-Agent Chrome extension coordinates multi-page and multi-tab tasks. Learn about its TabsController, RemotePageController, and MultiPageAgent components for seamless agent operation across tabs.

- Repository: [Alibaba/page-agent](https://github.com/alibaba/page-agent)
- Tags: internals
- Published: 2026-03-09

---

**The Page-Agent Chrome extension enables autonomous agents to work across multiple browser tabs by coordinating three core components—`TabsController` for tab lifecycle management, `RemotePageController` for cross-tab DOM manipulation, and `MultiPageAgent` for orchestration—all communicating through a service-worker using `TAB_CONTROL` and `PAGE_CONTROL` messages while sharing state via `chrome.storage.local`.**

The alibaba/page-agent repository provides a Chrome extension architecture that allows AI agents to seamlessly operate across multiple web pages and browser tabs. By leveraging Chrome's extension APIs and a message-passing architecture, the extension solves the complex challenge of multi-page and multi-tab coordination, enabling autonomous workflows that span numerous websites simultaneously.

## The Three Core Components of Multi-Page Coordination

The extension wires together three specialized controllers to manage distributed browser state.

### TabsController: Managing Tab Lifecycle and Groups

Located in [`packages/extension/src/agent/TabsController.ts`](https://github.com/alibaba/page-agent/blob/main/packages/extension/src/agent/TabsController.ts), the **TabsController** manages the Chrome tab lifecycle and maintains shared coordination state. It creates a dedicated **PageAgent** tab group with a random accent color via `randomColor`, grouping all agent-managed tabs visually for the user.

The controller stores critical state in `chrome.storage.local`, including `currentTabId`, the managed tab list, and tab summaries. All tab operations—open, switch, close, and update—execute by sending `TAB_CONTROL` messages from the agent to the service-worker background script, which then invokes the Chrome tab APIs. The background message handler lives in [`packages/extension/src/agent/TabsController.background.ts`](https://github.com/alibaba/page-agent/blob/main/packages/extension/src/agent/TabsController.background.ts).

### RemotePageController: Cross-Tab DOM Manipulation

The **RemotePageController** ([`packages/extension/src/agent/RemotePageController.ts`](https://github.com/alibaba/page-agent/blob/main/packages/extension/src/agent/RemotePageController.ts)) provides the same DOM-manipulation API that `PageAgentCore` expects, but forwards each request to a content script running in the target tab. Before routing, it validates that the target URL permits content scripts using `isContentScriptAllowed`, blocking restricted schemes like `chrome://`, `chrome-extension://`, and `file://`.

When the agent calls methods like `clickElement` or `getBrowserState`, the controller sends a `PAGE_CONTROL` message to the background script ([`packages/extension/src/agent/RemotePageController.background.ts`](https://github.com/alibaba/page-agent/blob/main/packages/extension/src/agent/RemotePageController.background.ts)), which routes the request to the correct tab's content script. The content script performs the actual DOM work and returns the result to the agent.

### MultiPageAgent: The Orchestration Layer

The **MultiPageAgent** class in [`packages/extension/src/agent/MultiPageAgent.ts`](https://github.com/alibaba/page-agent/blob/main/packages/extension/src/agent/MultiPageAgent.ts) extends `PageAgentCore` and integrates the above controllers. During `onBeforeTask`, it initializes `TabsController` (creating the tab group and optionally including the active tab), starts a heartbeat stored in `chrome.storage.local`, and sets `isAgentRunning` to true.

The `onBeforeStep` lifecycle hook guarantees the current tab is fully loaded before each LLM-driven step. When the agent needs to interact with a page, it invokes methods on `RemotePageController` (accessed via `agent.pageController`), which are transparently routed through the background service-worker to the appropriate content script.

## Message-Based Coordination Flow

The coordination relies on two distinct message types processed by the service-worker in [`packages/extension/src/entrypoints/background.ts`](https://github.com/alibaba/page-agent/blob/main/packages/extension/src/entrypoints/background.ts):

- **TAB_CONTROL**: Handles tab lifecycle operations (open, switch, close, group management)
- **PAGE_CONTROL**: Routes DOM manipulation requests to specific tab content scripts

The coordination flow follows these steps:

1. **Agent** calls `tabsController.init(task)` → sends `TAB_CONTROL` → `get_active_tab`
2. **Background SW** returns active tab ID, creates a new tab group via `create_tab_group`, and stores the group ID
3. **Agent** calls `openNewTab(url)` → sends `TAB_CONTROL` → `open_new_tab`
4. **Background SW** opens the Chrome tab, adds it to the PageAgent group, and resolves with `tabId`
5. **Agent** calls `remotePageController.getBrowserState()` → sends `PAGE_CONTROL` → `get_browser_state` with `targetTabId`
6. **Background SW** routes the `PAGE_CONTROL` message to the content script of the target tab
7. **Content script** extracts the DOM tree and returns `BrowserState`
8. **Agent** uses the returned data to drive LLM decisions, calling `switchToTab` or `clickElement` as needed
9. **Heartbeat** writes `{agentHeartbeat: Date.now()}` to `chrome.storage.local` every second so the side-panel can detect agent health

Because all state lives in `chrome.storage.local` and all communication routes through the service-worker, any extension component (side-panel, content script, or background) can read the current tab ID, managed tab list, and agent status, ensuring consistent coordination even if the user switches windows.

## Key Implementation Details

### Visual Tab Grouping for User Transparency

The `TabsController.openNewTab` method creates a Chrome tab group named `PageAgent(${task})` with a randomly selected accent color. All subsequent tabs join this same group, making it easy for users to visually distinguish the agent's workspace from regular browsing tabs.

### Content Script Security Gating

Before injecting scripts, `RemotePageController.isContentScriptAllowed` validates URLs to prevent injection into restricted Chrome pages. This security check blocks `chrome://`, `chrome-extension://`, and `file://` schemes, ensuring the agent only operates on standard web pages where content scripts are permitted.

### Global Context for LLM Decision Making

The `TabsController.summarizeTabs` method constructs a markdown table containing tab IDs, URLs, titles, and active status. This table prepends `BrowserState.header`, providing the LLM with a comprehensive global view of all open pages to inform multi-tab decisions.

### Lifecycle Management and Heartbeat Monitoring

`MultiPageAgent` registers lifecycle hooks (`onBeforeTask`, `onAfterTask`, `onBeforeStep`, `onDispose`) to manage coordination state. The `onBeforeStep` hook guarantees the current tab is fully loaded before executing DOM operations. A heartbeat mechanism writes timestamps to `chrome.storage.local` every second, enabling the side-panel to detect agent crashes by checking `isAgentRunning` and `agentHeartbeat` status.

## Practical Implementation Examples

Creating a multi-page agent:

```typescript
import { MultiPageAgent } from '@/agent/MultiPageAgent'

const agent = new MultiPageAgent({
  llmConfig: { model: 'gpt-4o', apiKey: 'YOUR_KEY' },
  includeInitialTab: true,
})

await agent.runTask(`Search for the latest JavaScript frameworks, open a new tab for each, and summarize the findings.`)

```

Opening a new tab programmatically:

```typescript
await agent.pageController.openNewTab('https://developer.mozilla.org/en-US/docs/Web/JavaScript')

```

Switching between managed tabs:

```typescript
await agent.tabsController.switchToTab(5)

```

Retrieving global browser state with tab summaries:

```typescript
const state = await agent.pageController.getBrowserState()
// state.header contains a markdown table of all managed tabs

```

Monitoring agent health from the side-panel UI:

```typescript
setInterval(async () => {
  const { isAgentRunning, agentHeartbeat } = await chrome.storage.local.get(['isAgentRunning', 'agentHeartbeat'])
  if (!isAgentRunning || Date.now() - agentHeartbeat > 5000) {
    console.log('Agent not active')
  }
}, 2000)

```

## Summary

- **Three-component architecture**: `MultiPageAgent` orchestrates `TabsController` (tab lifecycle) and `RemotePageController` (DOM operations) to enable seamless multi-page and multi-tab coordination.
- **Message-passing protocol**: `TAB_CONTROL` messages manage tabs via the background service-worker, while `PAGE_CONTROL` messages route DOM requests to specific content scripts in target tabs.
- **Shared state storage**: All coordination state persists in `chrome.storage.local`, including `currentTabId`, managed tab lists, and heartbeat timestamps, ensuring consistency across extension components.
- **Security and grouping**: The extension validates URLs before script injection via `isContentScriptAllowed` and visually groups managed tabs with random accent colors for user clarity.
- **LLM-aware context**: Tab summaries provide the agent with global context across all open pages through `BrowserState.header`, enabling informed multi-tab decision making.

## Frequently Asked Questions

### How does Page-Agent prevent the agent from accessing restricted Chrome pages?

The `RemotePageController` class implements `isContentScriptAllowed` to block URLs matching patterns like `chrome://`, `chrome-extension://`, and `file://`. This validation occurs in [`packages/extension/src/agent/RemotePageController.ts`](https://github.com/alibaba/page-agent/blob/main/packages/extension/src/agent/RemotePageController.ts) before any `PAGE_CONTROL` message routes DOM manipulation requests to the target tab, ensuring the extension only operates on standard web pages.

### What happens if a user manually closes a tab while the agent is running?

Because `TabsController` maintains the canonical tab list in `chrome.storage.local` and validates tab IDs before operations, the agent detects missing tabs during the next coordination cycle. The heartbeat mechanism and `onBeforeStep` lifecycle hooks ensure the agent checks tab existence before attempting DOM interactions, allowing graceful handling of manual tab closures without crashing the agent loop.

### Can multiple Page-Agent instances run simultaneously in different windows?

The current architecture uses shared state in `chrome.storage.local` keyed by specific agent identifiers. While the extension supports multiple tabs within one agent instance, running completely separate agent instances simultaneously would require careful management of storage keys to prevent state collisions between different agent sessions accessing the same `chrome.storage.local` namespace.

### How does the agent decide which tab to interact with next?

The `MultiPageAgent` prepends a markdown summary table of all managed tabs to the `BrowserState.header` field via `TabsController.summarizeTabs`. This gives the LLM visibility into every open tab's URL, title, and ID, allowing the language model to explicitly specify tab switches through `tabsController.switchToTab()` or target specific tabs via `remotePageController` method calls based on the task requirements.