How the extract_content Tool Handles Dynamic Content and Infinite Scroll Pages

The extract_content workflow rebuilds a flat DOM snapshot via PageController.updateTree() whenever the agent issues scroll or interaction commands, requiring explicit re-extraction to capture dynamic or infinitely-scrolled content rather than using automatic detection.

The extract_content capability in the alibaba/page-agent repository enables LLM agents to perceive web pages by serializing the DOM into a simplified representation. Unlike automatic monitoring systems, this tool employs a snapshot-based architecture that handles dynamic JavaScript updates and infinite scroll patterns through explicit agent commands. Understanding how extract_content manages these scenarios requires examining the PageController implementation and its DOM tree construction mechanics.

The extract_content Architecture

The extract_content functionality is not a standalone function but rather a DOM extraction workflow orchestrated by PageController.updateTree(). According to the source code in [packages/page-controller/src/PageController.ts](https://github.com/alibaba/page-agent/blob/main/packages/page-controller/src/PageController.ts) (lines 71-78), this method constructs a flat DOM tree that powers the agent's page perception.

When the LLM requests the browser state, PageController.getBrowserState() invokes updateTree() to perform three critical operations:

  1. Tree Construction: Builds a flat DOM tree representation of the current page state
  2. HTML Serialization: Converts the tree into a simplified HTML string stored in this.simplifiedHTML
  3. Element Mapping: Creates an elementTextMap that indexes elements for interaction targeting

The extraction scope respects the VIEWPORT_EXPANSION setting defined in [packages/page-controller/src/constants.ts](https://github.com/alibaba/page-agent/blob/main/packages/page-controller/src/constants.ts) (lines 15-17). By default, this value is -1, indicating full-page extraction rather than viewport-limited capture.

Handling Dynamic Content Updates

The framework does not implement automatic DOM monitoring for dynamic content. After JavaScript operations modify the page—such as AJAX loads, lazy-loaded components, or reactive framework updates—the previously extracted DOM becomes stale.

To capture these changes, the agent must explicitly trigger re-extraction. The updateTree() method runs again when the agent invokes specific tools that internally call it, including:

  • click_element_by_index
  • scroll
  • Any other interaction tool that modifies page state

This design requires agents to recognize when content has changed and proactively request a fresh snapshot, ensuring the simplifiedHTML accurately reflects the current DOM state.

Managing Infinite Scroll Pages

For infinite scroll implementations, the extractor only processes nodes present in the DOM at the moment of extraction. The alibaba/page-agent handles paginated scrolling through a scroll-then-re-extract pattern rather than automatic detection.

The Scroll Tool Mechanism

The scroll functionality is defined in [packages/core/src/tools/index.ts](https://github.com/alibaba/page-agent/blob/main/packages/core/src/tools/index.ts) (lines 31-41) and forwards requests to PageController.scroll(). This method ultimately delegates to scrollVertically or scrollHorizontally depending on the scroll direction parameters.

Scroll-Then-Re-extract Workflow

To handle infinite scroll pages effectively:

  1. Initial Extraction: Capture the first batch of content via updateTree()
  2. Scroll Execution: Use the scroll tool to scroll the page or specific scrollable element, triggering the infinite scroll loader
  3. Re-extraction: Invoke updateTree() again (or request browser state) to capture newly appended DOM nodes in the updated simplifiedHTML

This manual loop ensures the agent only processes actually rendered content, preventing hallucinations about elements that exist in the page logic but not yet in the DOM.

Technical Considerations

Mask Overlay Handling

During the extraction process, updateTree() temporarily disables the visual mask overlay if enabled (see lines 80-84 in PageController.ts). This prevents pointer-event restrictions from blocking the elementFromPoint calls used during tree construction, ensuring accurate element indexing and coordinate mapping.

Viewport Configuration

While the default VIEWPORT_EXPANSION of -1 captures the full page, agents can configure this setting to limit extraction to specific viewport segments. However, for infinite scroll scenarios, full-page extraction is typically required to capture all loaded content batches.

Practical Implementation Example

The following pattern demonstrates the scroll-then-extract loop for infinite scroll pages:

// Initial extraction
const initialState = await pageController.getBrowserState();
// Process first batch of items...

// Scroll to trigger infinite scroll loading
await pageController.scroll({ direction: 'down', amount: 800 });

// Re-extract to capture newly loaded content
const updatedState = await pageController.updateTree();
// Process additional items from updatedState.simplifiedHTML

Summary

  • extract_content is a snapshot workflow, not a continuous stream, implemented via PageController.updateTree() in PageController.ts
  • Dynamic content requires explicit refresh; the framework does not auto-update the DOM representation after JavaScript changes
  • Infinite scroll uses scroll-then-re-extract loops, utilizing the scroll tool defined in tools/index.ts followed by updateTree() calls
  • Full-page extraction is default via VIEWPORT_EXPANSION: -1 in constants.ts
  • Mask overlays are temporarily disabled during extraction (lines 80-84) to ensure accurate elementFromPoint execution

Frequently Asked Questions

Does extract_content automatically detect when new content loads via AJAX?

No, extract_content does not implement automatic detection for AJAX or dynamic content updates. The PageController maintains a static snapshot created by updateTree(). After JavaScript modifies the DOM, the agent must explicitly invoke updateTree() again or call getBrowserState() to refresh the simplified HTML representation.

How do I handle infinite scroll pages with the extract_content tool?

Handle infinite scroll by executing a scroll-then-re-extract sequence. First, use the scroll tool (which calls PageController.scroll() and delegates to scrollVertically) to load more content. Then, call updateTree() or request the browser state again to capture the newly appended DOM nodes in the updated extraction.

What is the VIEWPORT_EXPANSION setting and how does it affect extraction?

VIEWPORT_EXPANSION is a configuration constant defined in packages/page-controller/src/constants.ts (lines 15-17) that controls the extraction scope. The default value of -1 enables full-page extraction, while other values can limit the snapshot to specific viewport areas. For infinite scroll pages, maintaining the default full-page setting ensures all loaded content is captured.

Why does the extraction process disable mask overlays temporarily?

During updateTree() execution (lines 80-84 in PageController.ts), the controller temporarily disables visual mask overlays to prevent interference with elementFromPoint calls. This ensures accurate element identification and coordinate mapping during DOM tree construction, particularly when interactive elements are layered under visual overlays.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →