How to Use LiteParse Bounding Box Coordinates for Downstream NLP Tasks

LiteParse extracts every text fragment with viewport-space coordinates (x, y, width, height) from PDFs, enabling layout-aware NLP tasks like table reconstruction, spatial entity extraction, and visual question answering.

The run-llama/liteparse library parses PDF documents into structured text while preserving precise geometric metadata. By leveraging LiteParse bounding box coordinates, you can link raw text to its exact page location, unlocking sophisticated document understanding workflows that pure text extraction cannot support.

Understanding LiteParse's Bounding Box Data Structure

The TextItem Struct

The core data container for geometric information is the TextItem struct defined in crates/liteparse/src/types.rs (lines 13-24). This struct captures every text fragment with the following spatial fields:

  • x, y: Top-left corner coordinates in PDF viewport space
  • width, height: Dimensions of the text bounding box
  • rotation: Counter-clockwise rotation in degrees for handling rotated pages
  • confidence: OCR confidence score (when applicable)

According to the source code in crates/liteparse/src/types.rs, these values are expressed in PDF viewport space, where the origin sits at the top-left corner and 1 unit equals 1/72 inch (standard PDF points).

Page-Level Aggregation with ParsedPage

Individual text items are aggregated within the ParsedPage struct (lines 67-75 of the same file). This container holds the page dimensions (width, height) and an ordered list of TextItem instances, preserving the spatial relationship between fragments across the entire page.

Coordinate System Specifications

All coordinates use a 72 DPI viewport-space system. The library handles rotation normalization during extraction, storing the rotation angle so you can reconstruct true reading order even on documents scanned at angles. When selecting the json output format, the serializer in crates/liteparse/src/output/json.rs (lines 5-11, 44-52) maps these internal structs to JsonTextItem objects containing identical geometric fields.

NLP Applications for Spatial Coordinates

Layout-aware entity extraction uses geometric filters to isolate text within specific regions—for example, excluding footnotes by filtering y coordinates or capturing only header text by checking vertical position.

Table reconstruction groups TextItem instances by aligning y values (rows) and x values (columns), rebuilding grid structures that linear text extraction destroys.

Document visual question answering maps model output spans back to (x, y, width, height) rectangles, enabling highlight overlays on the original PDF or attention heatmaps during inference.

Spatial relation classification creates features like "above," "below," or "left-of" by calculating relative positions between entity bounding boxes, feeding geometric context into relation extraction models.

Multimodal pipelines combine extracted coordinates with rendered page images via LiteParse's screenshot API, using the bounding boxes as anchors for cross-modal attention mechanisms.

Accessing Coordinates Across Language Bindings

JSON Output Format

When configuring outputFormat: "json", LiteParse serializes the geometric data through crates/liteparse/src/output/json.rs. Each entry in the text_items array contains x, y, width, and height as floating-point numbers directly consumable by any JSON-capable pipeline.

Node.js and TypeScript

The Node.js bindings expose the TextItem type in packages/node/src/lib.ts (lines 32-40), making coordinates available as standard JavaScript object properties. The interface includes x: number, y: number, width: number, and height: number alongside text content.

Python Bindings

In Python, the TextItem dataclass in liteparse/types.py receives data converted by packages/python/liteparse/parser.py (lines 24-34). Access spatial fields directly as item.x, item.y, etc., after parsing.

Practical Spatial Filtering Examples

Filtering Header Regions in Node.js

To isolate text appearing only in the top 15% of the page (typically headers or titles):

import LiteParse from "liteparse";

(async () => {
  const parser = new LiteParse({ outputFormat: "json" });
  const result = await parser.parse("sample.pdf");

  // Filter items in the top 15% of page height
  const headerItems = result.pages.flatMap(p =>
    p.textItems.filter(i => i.y < p.height * 0.15)
  );

  const headerText = headerItems.map(i => i.text).join(" ");
  console.log("Header text:", headerText);
})();

The TextItem interface originates from the Node bindings in packages/node/src/lib.ts.

Column-Based Text Extraction in Python

For documents with multi-column layouts, split text items by horizontal position before processing:

from liteparse import LiteParse

parser = LiteParse(output_format="json")
result = parser.parse("report.pdf")

def column_items(page, x_mid):
    left = [i for i in page.text_items if i.x + i.width <= x_mid]
    right = [i for i in page.text_items if i.x >= x_mid]
    return left, right

for page in result.pages:
    # Split at midpoint

    left, right = column_items(page, page.width / 2)
    
    # Preserve reading order by sorting on y then x

    left_text = " ".join(i.text for i in sorted(left, key=lambda i: (i.y, i.x)))
    right_text = " ".join(i.text for i in sorted(right, key=lambda i: (i.y, i.x)))
    
    print(f"Page {page.page_num} - Left: {left_text}")
    print(f"Page {page.page_num} - Right: {right_text}")

Data conversion occurs in packages/python/liteparse/parser.py.

Low-Confidence OCR Filtering in Rust

When processing documents with potential OCR artifacts, filter by confidence scores using native Rust:

use liteparse::parser::LiteParse;
use liteparse::config::LiteParseConfig;

let config = LiteParseConfig {
    output_format: Some("json".into()),
    ..Default::default()
};

let parser = LiteParse::new(config);
let result = parser.parse("invoice.pdf")?;

let low_confidence: Vec<_> = result.pages.iter()
    .flat_map(|page| &page.text_items)
    .filter(|item| item.confidence.unwrap_or(1.0) < 0.6)
    .collect();

println!("Found {} low-confidence items", low_confidence.len());

ParsedPage and TextItem definitions reside in crates/liteparse/src/types.rs.

Visualizing Bounding Boxes with Screenshots

LiteParse's screenshot API generates rendered page images sharing the same coordinate system as the extracted text, enabling pixel-perfect overlay drawing:

import LiteParse from "liteparse";
import fs from "fs";

(async () => {
  const parser = new LiteParse({ outputFormat: "json", dpi: 150 });
  const { pages } = await parser.parse("contract.pdf");
  const screenshots = await parser.screenshot("contract.pdf", [1]);
  
  // Save page image
  fs.writeFileSync("page1.png", screenshots[0].imageBuffer);
  
  // Use pages[0].textItems coordinates to draw overlay rectangles
  // x, y, width, height map directly to image pixels at specified DPI
})();

The screenshot functionality returns imageBuffer data where the (x, y) origin aligns exactly with the bounding box coordinates from the parser output.

Summary

  • LiteParse stores bounding box coordinates in the TextItem struct using PDF viewport space (72 DPI, top-left origin) as implemented in crates/liteparse/src/types.rs.
  • Spatial metadata enables layout-aware NLP tasks including table reconstruction, column-based extraction, and visual question answering.
  • Access coordinates consistently across JSON output, Node.js/TypeScript, Python, and Rust interfaces, with bindings defined in packages/node/src/lib.ts and packages/python/liteparse/parser.py.
  • Apply spatial filters using geometric comparisons on x, y, width, and height to isolate regions like headers or columns before model inference.
  • Combine coordinate data with the screenshot API for multimodal pipelines requiring visual grounding or bounding box overlays.

Frequently Asked Questions

What coordinate system does LiteParse use for bounding boxes?

LiteParse uses PDF viewport space with the origin at the top-left corner, where 1 unit equals 1/72 inch (standard PDF point). The TextItem struct in crates/liteparse/src/types.rs stores these values as floating-point numbers along with rotation angles to handle text at arbitrary orientations.

How do I handle rotated text when using bounding box coordinates?

The TextItem struct includes a rotation field representing counter-clockwise degrees. When processing coordinates for visualization or spatial analysis, apply this rotation transformation to align the bounding box with the actual text orientation on the page.

Can I export bounding box data to JSON for use with other NLP libraries?

Yes. When configuring outputFormat: "json", LiteParse serializes each TextItem as a JsonTextItem through crates/liteparse/src/output/json.rs, preserving all geometric fields. This JSON output is compatible with Python, JavaScript, or any external NLP pipeline.

How do I filter text items by location in Python?

Iterate over page.text_items and compare the x, y, width, and height attributes against your target region. For example, isolate left-column text by filtering where item.x + item.width <= page.width / 2, then sort by (y, x) to maintain proper reading order before concatenating text.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →