# How PageController Handles DOM Extraction and Element Indexing in Alibaba Page Agent

> Discover how PageController extracts DOM and indexes elements in Alibaba Page Agent. It creates LLM-friendly formats, assigns stable indices, and maintains a selector map for precise actions.

- Repository: [Alibaba/page-agent](https://github.com/alibaba/page-agent)
- Tags: internals
- Published: 2026-03-09

---

**PageController transforms live web pages into LLM-friendly formats by traversing the DOM to build a flat tree, assigning stable numeric indices to interactive elements, and maintaining a selector map that enables precise programmatic actions.**

The **PageController** (`packages/page-controller`) is the core orchestration component in the Alibaba `page-agent` repository. It converts complex browser DOM structures into compact, indexed representations that AI agents can understand and manipulate through stable numeric references.

## The DOM Extraction Pipeline

### Triggering the Extraction Process

The extraction sequence begins when an API consumer invokes `pageController.updateTree()` in [`packages/page-controller/src/PageController.ts`](https://github.com/alibaba/page-agent/blob/main/packages/page-controller/src/PageController.ts). This method initiates a comprehensive DOM snapshot that captures the current state of the page, including all interactive elements within the configured viewport.

### Preparation and Cleanup

Before traversal begins, the controller performs two critical setup steps. First, if a visual mask is active, it is temporarily disabled to ensure unobstructed DOM inspection. Second, all previous interaction highlights are removed via `dom.cleanUpHighlights()` in [`packages/page-controller/src/dom/index.ts`](https://github.com/alibaba/page-agent/blob/main/packages/page-controller/src/dom/index.ts), ensuring a clean state for the new indexing operation.

## Building the Flat DOM Tree

### Tree Generation with getFlatTree

The `dom.getFlatTree(config)` function in [`src/dom/index.ts`](https://github.com/alibaba/page-agent/blob/main/src/dom/index.ts) serves as the entry point for DOM normalization. This function returns a **flat DOM tree** containing every node the library tracks, filtering based on visibility, interactivity, and the `viewportExpansion` parameter (which defaults to `VIEWPORT_EXPANSION` from [`src/constants.ts`](https://github.com/alibaba/page-agent/blob/main/src/constants.ts)).

### Deep Traversal and Caching

The heavy lifting occurs inside `domTree()` in [`packages/page-controller/src/dom/dom_tree/index.js`](https://github.com/alibaba/page-agent/blob/main/packages/page-controller/src/dom/dom_tree/index.js). This implementation performs a depth-first walk of the page while maintaining a `DOM_CACHE` (lines 58-67) to store bounding rectangles, client rectangles, and computed styles. This caching strategy prevents layout thrashing by avoiding repeated reflow calculations during traversal.

### Handling Shadow DOM and Iframes

The traversal algorithm recursively enters shadow roots via `node.shadowRoot` and descends into iframes by accessing `node.contentDocument` (around line 20,400). This ensures that interactive elements nested within web components or framed content receive consistent indexing alongside standard DOM nodes.

## Detecting Interactive Elements

### Visibility and Interactivity Heuristics

During traversal, each element undergoes rigorous filtering through functions like `isElementVisible`, `isTopElement`, and `isInExpandedViewport`. Interactivity determination relies on `isInteractiveElement` (lines 94-450), which combines multiple signals: cursor styles (`interactiveCursors`), tag name whitelists (`a`, `button`, `input`), ARIA roles, attached event listeners, and user-supplied whitelist/blacklist configurations.

### Metadata Attachment

For elements passing the interactivity checks, additional metadata is attached via a `WeakMap` called `extraData`. This includes scrollability indicators and cached bounding rectangles accessed through `getCachedBoundingRect` and `getCachedComputedStyle`.

## The Indexing System

### Allocating Highlight Indices

The indexing mechanism centers on a module-level `highlightIndex` variable initialized at 0 (line 47 of [`dom_tree/index.js`](https://github.com/alibaba/page-agent/blob/main/dom_tree/index.js)). When `highlightElement` identifies an interactive node that satisfies visibility and top-element requirements, it assigns the current index value to `node.highlightIndex` and increments the counter. This creates a stable, zero-based numeric identifier for each actionable element.

### Creating the Selector Map

After the flat tree is complete, `dom.getSelectorMap(flatTree)` in [`src/dom/index.ts`](https://github.com/alibaba/page-agent/blob/main/src/dom/index.ts) constructs a `Map<number, InteractiveElementDomNode>`. This map filters the tree for nodes where `node.isInteractive && typeof node.highlightIndex === 'number'`, creating the bridge between numeric indices and actual DOM references.

### Generating Simplified HTML

The `dom.flatTreeToString(flatTree, includeAttributes)` function converts the structured tree into indented text format where interactive nodes appear as `[<index>]<tag attributes>content`. The `dom.getElementTextMap(simplifiedHTML)` function then parses these lines using the regex `/^\[(\d+)\]<[^>]+>([^<]*)/` to build a `Map<number, string>` mapping indices to human-readable descriptions.

### Marking New Elements

The system tracks element persistence through the `node.isNew` flag. After building the tree, `getFlatTree` iterates over `elements.map` to flag nodes whose underlying DOM references have not appeared in previous snapshots, helping LLMs identify dynamic content changes.

## Practical Implementation

### Capturing an Indexed Snapshot

To extract the current page state with element indices:

```typescript
import { PageController } from '@page-agent/page-controller'

async function snapshot() {
  const controller = new PageController({ 
    enableMask: true, 
    viewportExpansion: 200 
  })
  
  // Runs full extraction: cleaning → traversal → indexing → mapping
  const simplifiedHTML = await controller.updateTree()
  const state = await controller.getBrowserState()
  
  console.log(state.header)   // Page title + scroll position
  console.log(state.content)  // Simplified HTML with [0], [1], [2]...
}

snapshot()

```

### Executing Actions by Index

Once indexed, elements are addressable through the selector map:

```typescript
import { PageController } from '@page-agent/page-controller'

async function act() {
  const pc = new PageController()
  await pc.updateTree()  // Ensure tree is indexed: this.isIndexed = true
  
  // Looks up element in selectorMap via getElementByIndex()
  const result = await pc.clickElement(5)
  console.log(result.message) // "✅ Clicked element (Submit)."
  
  // Scroll to specific indexed element
  await pc.scroll({ down: true, numPages: 1, index: 7 })
}

act()

```

The `clickElement` method in [`PageController.ts`](https://github.com/alibaba/page-agent/blob/main/PageController.ts) (line 44) uses `getElementByIndex(this.selectorMap, index)` to resolve the numeric reference to a DOM node before delegating to `actions.clickElement` in [`src/actions.ts`](https://github.com/alibaba/page-agent/blob/main/src/actions.ts).

## Summary

- **Orchestration**: `PageController.updateTree()` in [`src/PageController.ts`](https://github.com/alibaba/page-agent/blob/main/src/PageController.ts) coordinates the entire extraction pipeline from cleanup through indexing completion.
- **Normalization**: `getFlatTree()` and `domTree()` flatten complex DOM structures across shadow boundaries and iframe contexts into a uniform traversable format.
- **Optimization**: The `DOM_CACHE` mechanism (lines 58-67) eliminates layout thrashing by caching computed styles and bounding rectangles during the single traversal pass.
- **Addressability**: The selector map creates a stable `Map<number, Node>` bridge, enabling reliable programmatic interaction through numeric indices that persist across extraction cycles.
- **LLM Formatting**: `flatTreeToString` generates the bracketed index format `[0]<button>` that large language models parse to understand available actions.

## Frequently Asked Questions

### How does PageController handle Shadow DOM and iframes during extraction?

The `domTree()` traversal in [`src/dom/dom_tree/index.js`](https://github.com/alibaba/page-agent/blob/main/src/dom/dom_tree/index.js) detects shadow roots and recursively processes `node.shadowRoot`, while iframe handling accesses `node.contentDocument` and runs the same tree builder inside the frame context (around line 20,400). This ensures interactive elements within web components or embedded documents receive sequential indices alongside standard DOM nodes.

### What criteria determine if an element receives an index?

Elements must satisfy visibility checks (`isElementVisible`, `isTopElement`) and interactivity heuristics in `isInteractiveElement` (lines 94-450). The system evaluates cursor styles, tag name whitelists, ARIA roles, event listeners, and user-supplied filter lists. Only elements passing both visibility and interactivity tests receive a `highlightIndex` via the `highlightElement` function.

### How does the selector map maintain stable references to indexed elements?

The `getSelectorMap()` function in [`src/dom/index.ts`](https://github.com/alibaba/page-agent/blob/main/src/dom/index.ts) creates a `Map<number, InteractiveElementDomNode>` where keys are the numeric `highlightIndex` values and values are direct DOM node references. PageController stores this as `this.selectorMap`, allowing action methods like `clickElement()` to perform O(1) lookups of DOM nodes using the LLM-provided numeric indices.

### Can developers customize which elements get indexed?

Yes. The `getFlatTree(config)` function accepts configuration parameters including `viewportExpansion` (margin around visible area) and whitelist/blacklist arrays. These filters combine with the built-in interactivity heuristics in `isInteractiveElement` to control which nodes receive highlight indices and appear in the simplified HTML output.