# How to Implement a `drag_element` Tool for Complex UI Interactions in Page Agent

> Learn to implement the drag_element tool in alibaba/page-agent for complex UI interactions. Synthesize mouse events and expose as a tool following Zod schema patterns. Enhance your page automation capabilities.

- Repository: [Alibaba/page-agent](https://github.com/alibaba/page-agent)
- Tags: how-to-guide
- Published: 2026-03-09

---

**To implement a `drag_element` tool in the alibaba/page-agent repository, create a `dragElement` action in [`packages/page-controller/src/actions.ts`](https://github.com/alibaba/page-agent/blob/main/packages/page-controller/src/actions.ts) that synthesizes the full mouse event sequence, then expose it as a tool in [`packages/core/src/tools/index.ts`](https://github.com/alibaba/page-agent/blob/main/packages/core/src/tools/index.ts) following the existing Zod schema pattern.**

The alibaba/page-agent framework separates UI automation into a **Core** package for tool definitions and a **Page-Controller** package for DOM execution. When you need to implement a `drag_element` tool for complex interactions like drag-and-drop lists or sliders, you extend both layers while maintaining the architecture's clean separation between agent logic and browser manipulation.

## Architecture Overview: Core vs. Page-Controller

Page Agent uses a two-tier architecture where **tools** define the contract and **actions** perform the work. The Core package ([`packages/core/src/tools/index.ts`](https://github.com/alibaba/page-agent/blob/main/packages/core/src/tools/index.ts)) maintains a `Map<string, PageAgentTool>` called `tools` (defined at line 28) that registers all available capabilities. The Page-Controller package ([`packages/page-controller/src/actions.ts`](https://github.com/alibaba/page-agent/blob/main/packages/page-controller/src/actions.ts)) contains the actual DOM manipulation logic. This separation ensures that UI-specific code remains isolated from the headless-ready Core, making the `drag_element` implementation reusable across environments.

## Step 1: Implement the Drag Action in [`actions.ts`](https://github.com/alibaba/page-agent/blob/main/actions.ts)

Add the low-level `dragElement` function to [`packages/page-controller/src/actions.ts`](https://github.com/alibaba/page-agent/blob/main/packages/page-controller/src/actions.ts). This function reuses the existing `movePointerToElement` helper (lines 15-22) and `waitFor` utility (lines 9-11) to simulate realistic pointer movement.

```typescript
export async function dragElement(
  source: HTMLElement,
  target: HTMLElement,
  steps = 10
): Promise<string> {
  // Position cursor over source and initiate drag
  await movePointerToElement(source);
  window.dispatchEvent(new CustomEvent('PageAgent::ClickPointer'));
  source.dispatchEvent(new MouseEvent('mousedown', { bubbles: true, cancelable: true }));

  // Calculate path between element centers
  const src = source.getBoundingClientRect();
  const tgt = target.getBoundingClientRect();
  const startX = src.left + src.width / 2;
  const startY = src.top + src.height / 2;
  const endX = tgt.left + tgt.width / 2;
  const endY = tgt.top + tgt.height / 2;

  // Synthesize intermediate mousemove events
  for (let i = 1; i <= steps; i++) {
    const x = startX + ((endX - startX) * i) / steps;
    const y = startY + ((endY - startY) * i) / steps;
    window.dispatchEvent(new CustomEvent('PageAgent::MovePointerTo', { detail: { x, y } }));
    await waitFor(0.02);
  }

  // Release over target
  target.dispatchEvent(new MouseEvent('mouseup', { bubbles: true, cancelable: true }));
  target.dispatchEvent(new MouseEvent('click', { bubbles: true, cancelable: true }));
  await waitFor(0.1);
  
  return `✅ Dragged element ${source.tagName} → ${target.tagName}`;
}

```

This implementation mirrors the event sequence found in `clickElement` (lines 61-71), extending it with intermediate `mousemove` events to satisfy browser drag-and-drop APIs.

## Step 2: Register the Tool in Core

Expose the action as a tool in [`packages/core/src/tools/index.ts`](https://github.com/alibaba/page-agent/blob/main/packages/core/src/tools/index.ts) using the pattern established by `click_element_by_index` (lines 84-94). The registration uses a Zod schema for parameter validation and the `getElementByIndex` helper (lines 28-46) to resolve element references.

```typescript
tools.set(
  'drag_element',
  tool({
    description:
      'Drag an interactive element (source) onto another element (target). Useful for sliders, drag-and-drop lists, and canvas manipulations.',
    inputSchema: z.object({
      sourceIndex: z.number().int().min(0),
      targetIndex: z.number().int().min(0),
      steps: z.number().int().min(1).max(20).optional().default(10),
    }),
    execute: async function (this: PageAgentCore, input) {
      const source = getElementByIndex(this.pageController.selectorMap, input.sourceIndex);
      const target = getElementByIndex(this.pageController.selectorMap, input.targetIndex);
      const result = await this.pageController.dragElement(source, target, input.steps);
      return result;
    },
  })
);

```

The `execute` method receives the `PageAgentCore` instance as `this`, providing direct access to `this.pageController` without additional wiring.

## Step 3: Expose the Method in PageController

Ensure the `PageController` class exposes a public method that forwards to the action. Add a thin wrapper in [`packages/page-controller/src/PageController.ts`](https://github.com/alibaba/page-agent/blob/main/packages/page-controller/src/PageController.ts):

```typescript
public async dragElement(source: HTMLElement, target: HTMLElement, steps?: number): Promise<string> {
  return actions.dragElement(source, target, steps);
}

```

This maintains consistency with existing methods like `clickElement` and `inputTextElement`.

## Event Simulation and Browser Compatibility

The drag implementation relies on **synthesized events** rather than native Drag-and-Drop APIs. This approach works across complex web applications because:

- **Event bubbling**: All mouse events use `{ bubbles: true, cancelable: true }` to ensure framework event listeners (React, Vue, Angular) detect the interaction.
- **Pointer visualization**: CustomEvents (`PageAgent::MovePointerTo`, `PageAgent::ClickPointer`) decouple the virtual cursor display from DOM state changes.
- **Timing realism**: The 20ms delay between `mousemove` events mimics human pointer velocity, preventing anti-automation detection.

For absolute coordinate-based dragging (independent of specific elements), modify the tool schema to accept `startX`, `startY`, `endX`, `endY` parameters and bypass `getElementByIndex`.

## Summary

- **Create the action**: Add `dragElement` to [`packages/page-controller/src/actions.ts`](https://github.com/alibaba/page-agent/blob/main/packages/page-controller/src/actions.ts) using `movePointerToElement`, `mousedown`, interpolated `mousemove` events, and `mouseup`.
- **Register the tool**: Insert a new entry into the `tools` Map in [`packages/core/src/tools/index.ts`](https://github.com/alibaba/page-agent/blob/main/packages/core/src/tools/index.ts) with Zod validation for `sourceIndex`, `targetIndex`, and optional `steps`.
- **Maintain separation**: Keep DOM logic in Page-Controller and tool definitions in Core to preserve the architectural boundary between UI-heavy and headless execution contexts.
- **Verify with tests**: Run `npm run lint && npm test` to ensure type safety and prevent regression in existing tools like `scroll` and `click_element_by_index`.

## Frequently Asked Questions

### Where does the actual DOM manipulation for drag operations live?

The implementation belongs in [`packages/page-controller/src/actions.ts`](https://github.com/alibaba/page-agent/blob/main/packages/page-controller/src/actions.ts). This file contains all low-level browser automation including `clickElement`, `inputTextElement`, and the new `dragElement` function. Keeping DOM logic centralized here ensures the Core package remains UI-agnostic.

### How does the tool calculate movement coordinates between elements?

The `dragElement` action uses `source.getBoundingClientRect()` and `target.getBoundingClientRect()` to determine the center points of both elements. It then linearly interpolates between these coordinates based on the `steps` parameter, generating intermediate positions for each `mousemove` event.

### Can I modify the tool to use absolute screen coordinates instead of element indexes?

Yes. Update the Zod schema in [`tools/index.ts`](https://github.com/alibaba/page-agent/blob/main/tools/index.ts) to accept `startX`, `startY`, `endX`, and `endY` as optional alternatives to `sourceIndex` and `targetIndex`. In the `execute` method, conditionally bypass `getElementByIndex` and pass the coordinates directly to the Page-Controller action.

### Why does the implementation synthesize mouse events instead of using the HTML5 Drag and Drop API?

Synthetic mouse events (`mousedown`, `mousemove`, `mouseup`) provide broader compatibility with custom drag implementations (sliders, sortable lists, canvas drawing) that may not use the native Drag and Drop API. This approach also matches the existing `clickElement` pattern (lines 61-71) for consistent event simulation across the codebase.