# How to Use LiteParse in a WASM/Browser Environment: Complete Integration Guide

> Integrate LiteParse WASM into your browser app for serverless PDF parsing. This guide breaks down setup and usage for powerful, client-side document processing with custom OCR.

- Repository: [LlamaIndex/liteparse](https://github.com/run-llama/liteparse)
- Tags: how-to-guide
- Published: 2026-05-30

---

**LiteParse ships a dedicated WebAssembly build via the npm package `@llamaindex/liteparse-wasm` that enables complete PDF parsing in the browser without server dependencies, exposing a JavaScript-friendly API through [`crates/liteparse-wasm/src/lib.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse-wasm/src/lib.rs) that supports custom OCR engines and returns JSON-serializable page structures.**

LiteParse, the Rust-based PDF parser from the run-llama organization, provides a first-class WebAssembly target that runs entirely within the browser. This guide demonstrates how to use LiteParse in a WASM/browser environment to process PDF files client-side, configure parsing options, and integrate with JavaScript-based OCR engines.

## Installing the WASM Package

The WebAssembly distribution is published as `@llamaindex/liteparse-wasm` and generated using `wasm-pack` from the `crates/liteparse-wasm` directory. Install it via your preferred package manager:

```bash
npm install @llamaindex/liteparse-wasm

```

The package contains the compiled `.wasm` binary and JavaScript glue code that wraps the underlying Rust implementation. You can also load it directly from a CDN for static sites, as demonstrated in the reference implementation at [`wasm-demo-site/index.html`](https://github.com/run-llama/liteparse/blob/main/wasm-demo-site/index.html).

## Initializing LiteParse in the Browser

### Loading the WebAssembly Module

Before parsing documents, you must initialize the WASM runtime. The package exports an `init` helper that loads the binary and prepares the JavaScript bindings:

```typescript
import init, { LiteParse } from "@llamaindex/liteparse-wasm";

// Load the WASM file (bundled with the package or from CDN)
await init();

```

When loading from a CDN or custom location, you can pass the path to the `.wasm` file explicitly, as shown in the demo site's usage of `await mod.default(".../liteparse_wasm_bg.wasm")`.

### Creating the Parser Instance

Instantiate the **LiteParse** class with a plain JavaScript object containing your configuration. The constructor accepts camel-cased fields that map to the internal `LiteParseConfig` Rust struct:

```typescript
const parser = new LiteParse({
  ocrEnabled: false,
  outputFormat: "json",   // or "text"
  maxPages: 100,
});

```

Behind the scenes in [`crates/liteparse-wasm/src/lib.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse-wasm/src/lib.rs), the `JsLiteParseConfig::into_core` method deserializes these JavaScript options into the core configuration structure used by the Rust engine. The `from_core` method handles the reverse conversion when returning results.

## Parsing PDF Documents

### Converting Files to Uint8Array

The `parse` method accepts a `Uint8Array` containing raw PDF bytes. Convert browser File objects or fetched buffers before processing:

```typescript
const file = /* File from <input> or drag-drop */;
const bytes = new Uint8Array(await file.arrayBuffer());

```

### Calling the parse Method

Invoke the async `parse` method to execute the full extraction pipeline. The method returns a JavaScript object built from `JsParsedPage` and `JsTextItem` structures defined in the WASM glue code:

```typescript
const result = await parser.parse(bytes);

console.log("Extracted text:", result.text);
console.log("Page count:", result.pages.length);
console.log("First page items:", result.pages[0].textItems);

```

The `textItems` array contains objects with bounding boxes, font information, and confidence scores extracted by the Rust core in [`crates/liteparse/src/parser.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/parser.rs).

## Adding OCR Support in the Browser

Because native Tesseract or HTTP OCR backends cannot execute in the browser sandbox, the WASM build provides the **JsOcrEngine** wrapper in [`crates/liteparse-wasm/src/lib.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse-wasm/src/lib.rs). This bridge forwards OCR calls to any JavaScript object implementing the `recognize(imageData, width, height, language)` method.

Supply a custom OCR engine through the `ocrEngine` configuration field:

```typescript
const parser = new LiteParse({
  ocrEnabled: true,
  ocrLanguage: "eng",
  ocrEngine: {
    async recognize(imageData, width, height, language) {
      // imageData is a PNG bytes buffer produced by LiteParse
      const { data } = await Tesseract.recognize(
        new Uint8Array(imageData),
        language,
        { rectangle: { left: 0, top: 0, width, height } }
      );
      
      // Return format expected by LiteParse
      return data.words.map(w => ({
        text: w.text,
        bbox: [w.bbox.x0, w.bbox.y0, w.bbox.x1, w.bbox.y1],
        confidence: w.confidence / 100,
      }));
    },
  },
});

```

The `recognize` implementation in [`lib.rs`](https://github.com/run-llama/liteparse/blob/main/lib.rs) calls this JavaScript method for each image region requiring OCR, enabling integration with **tesseract.js** or remote OCR services while keeping the parsing logic native.

## WASM Architecture and Performance

### Core Implementation Details

The WebAssembly build leverages the same Rust core as the native CLI, located in `crates/liteparse/src/*`. All PDF rendering, text extraction, and spatial projection execute within the WASM runtime. The glue code in [`crates/liteparse-wasm/src/lib.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse-wasm/src/lib.rs) handles memory management, type conversion between JavaScript and Rust, and serialization of results into JSON-compatible structures.

### Single-Threaded Execution

Browser WebAssembly currently runs single-threaded, so the WASM build implicitly configures `cfg.num_workers = 1`. Despite this constraint, performance remains comparable to the native CLI for most documents because the heavy computational work occurs in optimized Rust code rather than JavaScript.

## Summary

- Install `@llamaindex/liteparse-wasm` to embed LiteParse in browser applications
- Call `init()` to load the WASM binary, then instantiate the `LiteParse` class with camel-cased configuration options
- Pass PDF bytes as `Uint8Array` to the async `parse` method, which returns structured data via `JsParsedPage` and `JsTextItem`
- Implement the `recognize` method on a JavaScript object to enable OCR using browser-compatible engines like tesseract.js
- All processing occurs client-side in the WASM runtime compiled from [`crates/liteparse-wasm/src/lib.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse-wasm/src/lib.rs), with no server requirements

## Frequently Asked Questions

### Can LiteParse run completely offline in the browser?

Yes. Once the WASM module and any OCR assets are loaded, LiteParse operates entirely within the browser without network requests. The parser processes PDFs locally using the Rust core compiled to WebAssembly, making it suitable for offline-first applications and privacy-sensitive document processing.

### How do I handle large PDFs in the browser?

Use the `maxPages` configuration option to limit processing scope, and consider implementing pagination logic in your application. Since the WASM build runs single-threaded with `num_workers = 1`, very large documents may block the main thread, so processing should ideally occur in a Web Worker or be limited to smaller page ranges.

### What OCR engines work with LiteParse WASM?

Any JavaScript library that can process PNG image data and return text with bounding boxes works through the `JsOcrEngine` bridge. **tesseract.js** is the most common choice for client-side OCR. The JavaScript object you provide must expose an async `recognize(imageData, width, height, language)` method that returns an array of objects containing `text`, `bbox` coordinates, and `confidence` scores.

### Is the WASM build as fast as the native CLI?

The WASM build delivers comparable performance to the native CLI for most use cases because both execute the same optimized Rust code from [`crates/liteparse/src/parser.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/parser.rs). The primary limitation is single-threaded execution in browsers, whereas the native CLI can utilize multiple workers. For CPU-intensive tasks on large documents, the native version maintains an advantage.