# How to Use LiteParse Bounding Box Coordinates for Downstream NLP Tasks

> Learn to use LiteParse bounding box coordinates for advanced NLP tasks like table reconstruction and visual question answering. Extract text with spatial awareness from PDFs.

- Repository: [LlamaIndex/liteparse](https://github.com/run-llama/liteparse)
- Tags: how-to-guide
- Published: 2026-05-30

---

**LiteParse extracts every text fragment with viewport-space coordinates (x, y, width, height) from PDFs, enabling layout-aware NLP tasks like table reconstruction, spatial entity extraction, and visual question answering.**

The `run-llama/liteparse` library parses PDF documents into structured text while preserving precise geometric metadata. By leveraging LiteParse bounding box coordinates, you can link raw text to its exact page location, unlocking sophisticated document understanding workflows that pure text extraction cannot support.

## Understanding LiteParse's Bounding Box Data Structure

### The TextItem Struct

The core data container for geometric information is the **`TextItem`** struct defined in [`crates/liteparse/src/types.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/types.rs) (lines 13-24). This struct captures every text fragment with the following spatial fields:

- **`x`**, **`y`**: Top-left corner coordinates in PDF viewport space
- **`width`**, **`height`**: Dimensions of the text bounding box
- **`rotation`**: Counter-clockwise rotation in degrees for handling rotated pages
- **`confidence`**: OCR confidence score (when applicable)

According to the source code in [`crates/liteparse/src/types.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/types.rs), these values are expressed in **PDF viewport space**, where the origin sits at the top-left corner and 1 unit equals 1/72 inch (standard PDF points).

### Page-Level Aggregation with ParsedPage

Individual text items are aggregated within the **`ParsedPage`** struct (lines 67-75 of the same file). This container holds the page dimensions (`width`, `height`) and an ordered list of `TextItem` instances, preserving the spatial relationship between fragments across the entire page.

### Coordinate System Specifications

All coordinates use a **72 DPI viewport-space** system. The library handles rotation normalization during extraction, storing the rotation angle so you can reconstruct true reading order even on documents scanned at angles. When selecting the `json` output format, the serializer in [`crates/liteparse/src/output/json.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/output/json.rs) (lines 5-11, 44-52) maps these internal structs to `JsonTextItem` objects containing identical geometric fields.

## NLP Applications for Spatial Coordinates

**Layout-aware entity extraction** uses geometric filters to isolate text within specific regions—for example, excluding footnotes by filtering `y` coordinates or capturing only header text by checking vertical position.

**Table reconstruction** groups `TextItem` instances by aligning `y` values (rows) and `x` values (columns), rebuilding grid structures that linear text extraction destroys.

**Document visual question answering** maps model output spans back to `(x, y, width, height)` rectangles, enabling highlight overlays on the original PDF or attention heatmaps during inference.

**Spatial relation classification** creates features like "above," "below," or "left-of" by calculating relative positions between entity bounding boxes, feeding geometric context into relation extraction models.

**Multimodal pipelines** combine extracted coordinates with rendered page images via LiteParse's screenshot API, using the bounding boxes as anchors for cross-modal attention mechanisms.

## Accessing Coordinates Across Language Bindings

### JSON Output Format

When configuring `outputFormat: "json"`, LiteParse serializes the geometric data through [`crates/liteparse/src/output/json.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/output/json.rs). Each entry in the `text_items` array contains `x`, `y`, `width`, and `height` as floating-point numbers directly consumable by any JSON-capable pipeline.

### Node.js and TypeScript

The Node.js bindings expose the `TextItem` type in [`packages/node/src/lib.ts`](https://github.com/run-llama/liteparse/blob/main/packages/node/src/lib.ts) (lines 32-40), making coordinates available as standard JavaScript object properties. The interface includes `x: number`, `y: number`, `width: number`, and `height: number` alongside text content.

### Python Bindings

In Python, the `TextItem` dataclass in [`liteparse/types.py`](https://github.com/run-llama/liteparse/blob/main/liteparse/types.py) receives data converted by [`packages/python/liteparse/parser.py`](https://github.com/run-llama/liteparse/blob/main/packages/python/liteparse/parser.py) (lines 24-34). Access spatial fields directly as `item.x`, `item.y`, etc., after parsing.

## Practical Spatial Filtering Examples

### Filtering Header Regions in Node.js

To isolate text appearing only in the top 15% of the page (typically headers or titles):

```typescript
import LiteParse from "liteparse";

(async () => {
  const parser = new LiteParse({ outputFormat: "json" });
  const result = await parser.parse("sample.pdf");

  // Filter items in the top 15% of page height
  const headerItems = result.pages.flatMap(p =>
    p.textItems.filter(i => i.y < p.height * 0.15)
  );

  const headerText = headerItems.map(i => i.text).join(" ");
  console.log("Header text:", headerText);
})();

```

*The `TextItem` interface originates from the Node bindings in [`packages/node/src/lib.ts`](https://github.com/run-llama/liteparse/blob/main/packages/node/src/lib.ts).*

### Column-Based Text Extraction in Python

For documents with multi-column layouts, split text items by horizontal position before processing:

```python
from liteparse import LiteParse

parser = LiteParse(output_format="json")
result = parser.parse("report.pdf")

def column_items(page, x_mid):
    left = [i for i in page.text_items if i.x + i.width <= x_mid]
    right = [i for i in page.text_items if i.x >= x_mid]
    return left, right

for page in result.pages:
    # Split at midpoint

    left, right = column_items(page, page.width / 2)
    
    # Preserve reading order by sorting on y then x

    left_text = " ".join(i.text for i in sorted(left, key=lambda i: (i.y, i.x)))
    right_text = " ".join(i.text for i in sorted(right, key=lambda i: (i.y, i.x)))
    
    print(f"Page {page.page_num} - Left: {left_text}")
    print(f"Page {page.page_num} - Right: {right_text}")

```

*Data conversion occurs in [`packages/python/liteparse/parser.py`](https://github.com/run-llama/liteparse/blob/main/packages/python/liteparse/parser.py).*

### Low-Confidence OCR Filtering in Rust

When processing documents with potential OCR artifacts, filter by confidence scores using native Rust:

```rust
use liteparse::parser::LiteParse;
use liteparse::config::LiteParseConfig;

let config = LiteParseConfig {
    output_format: Some("json".into()),
    ..Default::default()
};

let parser = LiteParse::new(config);
let result = parser.parse("invoice.pdf")?;

let low_confidence: Vec<_> = result.pages.iter()
    .flat_map(|page| &page.text_items)
    .filter(|item| item.confidence.unwrap_or(1.0) < 0.6)
    .collect();

println!("Found {} low-confidence items", low_confidence.len());

```

*ParsedPage and TextItem definitions reside in [`crates/liteparse/src/types.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/types.rs).*

## Visualizing Bounding Boxes with Screenshots

LiteParse's screenshot API generates rendered page images sharing the same coordinate system as the extracted text, enabling pixel-perfect overlay drawing:

```typescript
import LiteParse from "liteparse";
import fs from "fs";

(async () => {
  const parser = new LiteParse({ outputFormat: "json", dpi: 150 });
  const { pages } = await parser.parse("contract.pdf");
  const screenshots = await parser.screenshot("contract.pdf", [1]);
  
  // Save page image
  fs.writeFileSync("page1.png", screenshots[0].imageBuffer);
  
  // Use pages[0].textItems coordinates to draw overlay rectangles
  // x, y, width, height map directly to image pixels at specified DPI
})();

```

The screenshot functionality returns `imageBuffer` data where the `(x, y)` origin aligns exactly with the bounding box coordinates from the parser output.

## Summary

- **LiteParse** stores bounding box coordinates in the `TextItem` struct using PDF viewport space (72 DPI, top-left origin) as implemented in [`crates/liteparse/src/types.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/types.rs).
- Spatial metadata enables **layout-aware NLP tasks** including table reconstruction, column-based extraction, and visual question answering.
- Access coordinates consistently across **JSON output**, **Node.js/TypeScript**, **Python**, and **Rust** interfaces, with bindings defined in [`packages/node/src/lib.ts`](https://github.com/run-llama/liteparse/blob/main/packages/node/src/lib.ts) and [`packages/python/liteparse/parser.py`](https://github.com/run-llama/liteparse/blob/main/packages/python/liteparse/parser.py).
- Apply **spatial filters** using geometric comparisons on `x`, `y`, `width`, and `height` to isolate regions like headers or columns before model inference.
- Combine coordinate data with the **screenshot API** for multimodal pipelines requiring visual grounding or bounding box overlays.

## Frequently Asked Questions

### What coordinate system does LiteParse use for bounding boxes?

LiteParse uses PDF viewport space with the origin at the top-left corner, where 1 unit equals 1/72 inch (standard PDF point). The `TextItem` struct in [`crates/liteparse/src/types.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/types.rs) stores these values as floating-point numbers along with rotation angles to handle text at arbitrary orientations.

### How do I handle rotated text when using bounding box coordinates?

The `TextItem` struct includes a rotation field representing counter-clockwise degrees. When processing coordinates for visualization or spatial analysis, apply this rotation transformation to align the bounding box with the actual text orientation on the page.

### Can I export bounding box data to JSON for use with other NLP libraries?

Yes. When configuring `outputFormat: "json"`, LiteParse serializes each `TextItem` as a `JsonTextItem` through [`crates/liteparse/src/output/json.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/output/json.rs), preserving all geometric fields. This JSON output is compatible with Python, JavaScript, or any external NLP pipeline.

### How do I filter text items by location in Python?

Iterate over `page.text_items` and compare the `x`, `y`, `width`, and `height` attributes against your target region. For example, isolate left-column text by filtering where `item.x + item.width <= page.width / 2`, then sort by `(y, x)` to maintain proper reading order before concatenating text.