How to Use LiteParse in a WASM/Browser Environment with Custom OCR Callbacks

LiteParse ships a WebAssembly (WASM) package that runs the full PDF parsing pipeline directly in the browser, delegating OCR through a JavaScript callback bridge that invokes an async recognize(imageData, width, height, language) method on any object you provide.

The run-llama/liteparse repository provides a WASM build that mirrors the Node.js API, allowing you to parse PDFs client-side without native binaries. When running in the browser, the parser uses a JsOcrEngine bridge to forward OCR requests to JavaScript, enabling you to plug in any OCR library or service while keeping the core parsing logic in Rust.

How the WASM OCR Bridge Works

Inside crates/liteparse-wasm/src/lib.rs, the JsOcrEngine struct implements the Rust OcrEngine trait by forwarding calls to a JavaScript object. When the core parser encounters a page requiring OCR, the bridge:

  1. Packs the raw PNG bytes into a Uint8Array (lines 1999-2003).
  2. Invokes your JavaScript object's recognize method via Reflect::get and Function::apply (lines 2006-2020).
  3. Awaits the returned Promise and decodes the result into a Rust Vec<OcrResult> (lines 2025-2439).

This architecture means CoreLiteParse treats your JavaScript callback exactly like a native OCR engine, with the only difference being that execution happens in the main thread or a Web Worker you control.

Installation and Basic Setup

Install the WASM package from npm:

npm install @llamaindex/liteparse-wasm

Initialize the module and create a parser instance. The init() function loads the WASM module and prepares the runtime:

import init, { LiteParse } from '@llamaindex/liteparse-wasm';

async function run() {
  // Load the WASM runtime
  await init();

  // Create parser without OCR
  const parser = new LiteParse({
    ocrEnabled: false,
    outputFormat: 'json',
  });

  const bytes = new Uint8Array(await file.arrayBuffer());
  const result = await parser.parse(bytes);
  
  console.log('Full text:', result.text);
  console.log('Pages:', result.pages);
}

Implementing Custom OCR Callbacks

To enable OCR in the browser, provide a JavaScript object with an async recognize method in your configuration. The method receives PNG-encoded image data and must return an array of recognition results with bounding boxes.

// Custom OCR engine implementation
const ocrEngine = {
  async recognize(
    imageData: Uint8Array, 
    width: number, 
    height: number, 
    language: string
  ) {
    // Example: delegate to tesseract.js or a cloud API
    // Return format matches the OcrResult struct
    return [
      { 
        text: 'Recognized text', 
        bbox: [10, 20, 100, 40], // [x1, y1, x2, y2] in pixels
        confidence: 0.95 
      }
    ];
  }
};

const parser = new LiteParse({
  ocrEnabled: true,
  ocrLanguage: 'eng',
  ocrEngine,  // Inject custom implementation
  outputFormat: 'json',
});

const result = await parser.parse(bytes);

The ocrEngine object is stored in the LiteParse instance and invoked whenever the core parser encounters images requiring text recognition.

Browser Integration with Web Workers

For production use, run OCR inside a Web Worker to avoid blocking the main thread. The following pattern demonstrates how to structure the Worker communication, similar to the example in scripts/browser-compat/wasm-test.html:

<script type="module">
import init, { LiteParse } from '/packages/wasm/pkg/liteparse_wasm.js';

async function ocrEngineFactory() {
  const worker = new Worker('tesseract-worker.js');
  return {
    async recognize(imageData, width, height, language) {
      return new Promise((resolve) => {
        worker.onmessage = (e) => resolve(e.data);
        worker.postMessage({ imageData, width, height, language }, [imageData.buffer]);
      });
    },
  };
}

document.getElementById('pdfInput').addEventListener('change', async (e) => {
  const file = e.target.files[0];
  if (!file) return;

  await init();
  
  const parser = new LiteParse({
    ocrEnabled: true,
    ocrLanguage: 'eng',
    ocrEngine: await ocrEngineFactory(),
    outputFormat: 'json',
  });

  const bytes = new Uint8Array(await file.arrayBuffer());
  const result = await parser.parse(bytes);
  
  console.log('Extracted with OCR:', result.text);
});
</script>

Configuration Structure

The JsLiteParseConfig struct (lines 37-89 in crates/liteparse-wasm/src/lib.rs) validates camelCase options and maps them to the core Rust LiteParseConfig. Key fields include:

  • ocrEnabled: Boolean to toggle OCR processing
  • ocrLanguage: Language code (e.g., "eng") passed as the fourth argument to your recognize callback
  • ocrEngine: Your JavaScript object implementing the recognize method
  • outputFormat: "json", "text", or "markdown", controlling the serialization of JsParseResult

Summary

  • LiteParse's WASM build exposes a LiteParse class that accepts a custom ocrEngine callback implementing the recognize method signature.
  • The JsOcrEngine bridge in crates/liteparse-wasm/src/lib.rs converts Rust OCR requests into JavaScript Promise calls, handling PNG serialization via Uint8Array and result deserialization via serde_wasm_bindgen.
  • Your callback receives Uint8Array PNG data and dimensions, returning an array of objects with text, bbox (bounding box array), and confidence properties.
  • The parser returns a JsParseResult containing a pages array (with JsParsedPage items) and a concatenated text string.
  • For browser performance, execute OCR inside a Web Worker and return results to the main thread to keep the UI responsive.

Frequently Asked Questions

What signature must my custom OCR callback implement?

Your OCR object must expose an async recognize method accepting four parameters: imageData (Uint8Array of PNG bytes), width (number), height (number), and language (string). It must return a Promise that resolves to an array of objects, each containing text (string), bbox ([x1, y1, x2, y2] array), and confidence (number) properties. This contract is enforced by the JsOcrEngine implementation in lib.rs.

Does the WASM build support the same features as the Node.js version?

Yes, the WASM API mirrors the Node.js wrapper, using camelCase configuration objects and returning the same JsParseResult structure. However, native Tesseract and HTTP backends are unavailable in WASM; you must provide a custom OCR callback for text recognition. According to the browser usage guide in docs/src/content/docs/liteparse/guides/browser-usage.md, all core parsing features work identically once the OCR bridge is configured.

Can I use tesseract.js with LiteParse WASM?

Absolutely. Tesseract.js is a common choice for browser OCR. Create a Web Worker running tesseract.js, then implement the recognize callback to post the PNG data to the Worker and await the recognition results. The scripts/browser-compat/wasm-test.html example demonstrates the WASM loading pattern, while your OCR callback handles the tesseract.js-specific integration by mapping the imageData parameter to tesseract.js's expected input format.

How is the configuration validated?

The JsLiteParseConfig implementation in crates/liteparse-wasm/src/lib.rs (lines 37-89) validates fields like outputFormat and converts camelCase JavaScript options to the Rust LiteParseConfig struct. Invalid configurations throw JavaScript errors during the new LiteParse() constructor call before any PDF processing begins, ensuring fail-fast behavior for missing required callbacks or unsupported output formats.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →