How to Use LiteParse in a WASM/Browser Environment with Custom OCR Callbacks
LiteParse ships a WebAssembly (WASM) package that runs the full PDF parsing pipeline directly in the browser, delegating OCR through a JavaScript callback bridge that invokes an async recognize(imageData, width, height, language) method on any object you provide.
The run-llama/liteparse repository provides a WASM build that mirrors the Node.js API, allowing you to parse PDFs client-side without native binaries. When running in the browser, the parser uses a JsOcrEngine bridge to forward OCR requests to JavaScript, enabling you to plug in any OCR library or service while keeping the core parsing logic in Rust.
How the WASM OCR Bridge Works
Inside crates/liteparse-wasm/src/lib.rs, the JsOcrEngine struct implements the Rust OcrEngine trait by forwarding calls to a JavaScript object. When the core parser encounters a page requiring OCR, the bridge:
- Packs the raw PNG bytes into a
Uint8Array(lines 1999-2003). - Invokes your JavaScript object's
recognizemethod viaReflect::getandFunction::apply(lines 2006-2020). - Awaits the returned Promise and decodes the result into a Rust
Vec<OcrResult>(lines 2025-2439).
This architecture means CoreLiteParse treats your JavaScript callback exactly like a native OCR engine, with the only difference being that execution happens in the main thread or a Web Worker you control.
Installation and Basic Setup
Install the WASM package from npm:
npm install @llamaindex/liteparse-wasm
Initialize the module and create a parser instance. The init() function loads the WASM module and prepares the runtime:
import init, { LiteParse } from '@llamaindex/liteparse-wasm';
async function run() {
// Load the WASM runtime
await init();
// Create parser without OCR
const parser = new LiteParse({
ocrEnabled: false,
outputFormat: 'json',
});
const bytes = new Uint8Array(await file.arrayBuffer());
const result = await parser.parse(bytes);
console.log('Full text:', result.text);
console.log('Pages:', result.pages);
}
Implementing Custom OCR Callbacks
To enable OCR in the browser, provide a JavaScript object with an async recognize method in your configuration. The method receives PNG-encoded image data and must return an array of recognition results with bounding boxes.
// Custom OCR engine implementation
const ocrEngine = {
async recognize(
imageData: Uint8Array,
width: number,
height: number,
language: string
) {
// Example: delegate to tesseract.js or a cloud API
// Return format matches the OcrResult struct
return [
{
text: 'Recognized text',
bbox: [10, 20, 100, 40], // [x1, y1, x2, y2] in pixels
confidence: 0.95
}
];
}
};
const parser = new LiteParse({
ocrEnabled: true,
ocrLanguage: 'eng',
ocrEngine, // Inject custom implementation
outputFormat: 'json',
});
const result = await parser.parse(bytes);
The ocrEngine object is stored in the LiteParse instance and invoked whenever the core parser encounters images requiring text recognition.
Browser Integration with Web Workers
For production use, run OCR inside a Web Worker to avoid blocking the main thread. The following pattern demonstrates how to structure the Worker communication, similar to the example in scripts/browser-compat/wasm-test.html:
<script type="module">
import init, { LiteParse } from '/packages/wasm/pkg/liteparse_wasm.js';
async function ocrEngineFactory() {
const worker = new Worker('tesseract-worker.js');
return {
async recognize(imageData, width, height, language) {
return new Promise((resolve) => {
worker.onmessage = (e) => resolve(e.data);
worker.postMessage({ imageData, width, height, language }, [imageData.buffer]);
});
},
};
}
document.getElementById('pdfInput').addEventListener('change', async (e) => {
const file = e.target.files[0];
if (!file) return;
await init();
const parser = new LiteParse({
ocrEnabled: true,
ocrLanguage: 'eng',
ocrEngine: await ocrEngineFactory(),
outputFormat: 'json',
});
const bytes = new Uint8Array(await file.arrayBuffer());
const result = await parser.parse(bytes);
console.log('Extracted with OCR:', result.text);
});
</script>
Configuration Structure
The JsLiteParseConfig struct (lines 37-89 in crates/liteparse-wasm/src/lib.rs) validates camelCase options and maps them to the core Rust LiteParseConfig. Key fields include:
- ocrEnabled: Boolean to toggle OCR processing
- ocrLanguage: Language code (e.g., "eng") passed as the fourth argument to your
recognizecallback - ocrEngine: Your JavaScript object implementing the
recognizemethod - outputFormat: "json", "text", or "markdown", controlling the serialization of
JsParseResult
Summary
- LiteParse's WASM build exposes a
LiteParseclass that accepts a customocrEnginecallback implementing therecognizemethod signature. - The
JsOcrEnginebridge incrates/liteparse-wasm/src/lib.rsconverts Rust OCR requests into JavaScript Promise calls, handling PNG serialization viaUint8Arrayand result deserialization viaserde_wasm_bindgen. - Your callback receives
Uint8ArrayPNG data and dimensions, returning an array of objects withtext,bbox(bounding box array), andconfidenceproperties. - The parser returns a
JsParseResultcontaining apagesarray (withJsParsedPageitems) and a concatenatedtextstring. - For browser performance, execute OCR inside a Web Worker and return results to the main thread to keep the UI responsive.
Frequently Asked Questions
What signature must my custom OCR callback implement?
Your OCR object must expose an async recognize method accepting four parameters: imageData (Uint8Array of PNG bytes), width (number), height (number), and language (string). It must return a Promise that resolves to an array of objects, each containing text (string), bbox ([x1, y1, x2, y2] array), and confidence (number) properties. This contract is enforced by the JsOcrEngine implementation in lib.rs.
Does the WASM build support the same features as the Node.js version?
Yes, the WASM API mirrors the Node.js wrapper, using camelCase configuration objects and returning the same JsParseResult structure. However, native Tesseract and HTTP backends are unavailable in WASM; you must provide a custom OCR callback for text recognition. According to the browser usage guide in docs/src/content/docs/liteparse/guides/browser-usage.md, all core parsing features work identically once the OCR bridge is configured.
Can I use tesseract.js with LiteParse WASM?
Absolutely. Tesseract.js is a common choice for browser OCR. Create a Web Worker running tesseract.js, then implement the recognize callback to post the PNG data to the Worker and await the recognition results. The scripts/browser-compat/wasm-test.html example demonstrates the WASM loading pattern, while your OCR callback handles the tesseract.js-specific integration by mapping the imageData parameter to tesseract.js's expected input format.
How is the configuration validated?
The JsLiteParseConfig implementation in crates/liteparse-wasm/src/lib.rs (lines 37-89) validates fields like outputFormat and converts camelCase JavaScript options to the Rust LiteParseConfig struct. Invalid configurations throw JavaScript errors during the new LiteParse() constructor call before any PDF processing begins, ensuring fail-fast behavior for missing required callbacks or unsupported output formats.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →