How to Implement a Custom OCR Engine in LiteParse with the OcrEngine Trait

Implement the OcrEngine trait defined in crates/liteparse/src/ocr/mod.rs by providing name() and recognize() methods, then register your engine via LiteParse::with_ocr_engine using Arc<dyn OcrEngine> to override the default HTTP or Tesseract backend.

LiteParse, the document parsing library from the run-llama ecosystem, abstracts OCR functionality behind a trait-based interface that supports both native and WebAssembly targets. By implementing the OcrEngine trait, you can integrate custom backends—from proprietary cloud APIs to on-device machine learning models—while preserving the library's core parsing pipeline and configuration system.

Understanding the OcrEngine Trait

The OcrEngine trait is defined in crates/liteparse/src/ocr/mod.rs and serves as the abstraction layer for all OCR operations. The trait requires Send + Sync bounds, but the future returned by recognize has platform-specific constraints:

  • On native targets (#[cfg(not(target_arch = "wasm32"))]), the future must be Send to allow the async runtime to move it across threads.
  • On WebAssembly, the trait remains Send + Sync, but the future does not require the Send bound because the runtime is single-threaded.
// crates/liteparse/src/ocr/mod.rs
pub trait OcrEngine: Send + Sync {
    fn name(&self) -> &str;
    fn recognize<'a, 'b: 'a, 'c: 'a>(
        &'a self,
        image_data: &'c [u8],
        width: u32,
        height: u32,
        options: &'b OcrOptions,
    ) -> Pin<
        Box<
            dyn Future<
                Output = Result<Vec<OcrResult>, Box<dyn std::error::Error + Send + Sync>>
            > + Send + '_,
        >,
    >;
}

The recognize method receives raw image bytes (typically PNG), pixel dimensions, and OcrOptions, returning a pinned future that resolves to a Vec<OcrResult> or a boxed error.

Step-by-Step Implementation Guide

Follow these steps to create a production-ready custom OCR engine in LiteParse:

  1. Create a module inside crates/liteparse/src/ocr/ (e.g., my_ocr.rs) and import OcrEngine, OcrOptions, and OcrResult.
  2. Define a struct holding your engine's configuration, such as API endpoints or model handles.
  3. Implement OcrEngine for your struct, providing:
    • name(): Returns a static string identifier for logging.
    • recognize(): Returns a pinned future containing your async OCR logic.
  4. Ensure thread safety by using Send + Sync types for native targets; wrap mutable state in Arc<Mutex<T>> if necessary.
  5. Expose a constructor (typically new()) that initializes the engine.

Complete Custom Engine Example

Here is a minimal "Echo" engine implementation that demonstrates the trait contract without external dependencies:

// crates/liteparse/src/ocr/my_ocr.rs
use super::{OcrEngine, OcrOptions, OcrResult};
use std::future::Future;
use std::pin::Pin;

/// A very simple "echo" OCR engine used for demonstration.
pub struct EchoEngine;

impl EchoEngine {
    pub fn new() -> Self {
        EchoEngine
    }
}

impl OcrEngine for EchoEngine {
    fn name(&self) -> &str {
        "echo"
    }

    fn recognize<'a, 'b: 'a, 'c: 'a>(
        &'a self,
        _image_data: &'c [u8],
        _width: u32,
        _height: u32,
        options: &'b OcrOptions,
    ) -> Pin<Box<dyn Future<Output = Result<Vec<OcrResult>, Box<dyn std::error::Error + Send + Sync>>> + Send + '_>>
    {
        // In a real engine you would send the image to an OCR service here.
        // This placeholder just returns a single word containing the requested language.
        Box::pin(async move {
            Ok(vec![OcrResult {
                text: format!("language={}", options.language),
                bbox: [0.0, 0.0, 100.0, 20.0],
                confidence: 1.0,
            }])
        })
    }
}

This example returns a single result containing the requested language code, illustrating the expected return format: text (extracted string), bbox ([x1, y1, x2, y2] in pixel coordinates), and confidence (0.0 to 1.0).

Wiring Your Engine into LiteParse

The LiteParse struct stores the custom engine in the ocr_engine_override field and selects it inside parse_input (see crates/liteparse/src/parser.rs, lines 43-57 and 71-78). Use the with_ocr_engine method to inject your implementation:

use liteparse::parser::LiteParse;
use liteparse::ocr::my_ocr::EchoEngine;
use std::sync::Arc;

// Build the standard configuration
let cfg = liteparse::config::LiteParseConfig::default();

// Create the parser and inject the custom engine
let parser = LiteParse::new(cfg)
    .with_ocr_engine(Arc::new(EchoEngine::new()));

// Now `parser.parse_input(...)` will use `EchoEngine` instead of HTTP/Tesseract.

When ocr_engine_override is Some(Arc<dyn OcrEngine>), LiteParse bypasses its default selection logic—which normally chooses between HTTP OCR servers and the built-in Tesseract engine—and delegates all OCR operations to your implementation.

Platform-Specific Considerations

When implementing OcrEngine, account for these architectural requirements:

  • Thread Safety: On native platforms, ensure your engine and returned futures are Send + Sync. Use thread-safe HTTP clients like reqwest::Client or protect mutable state with Mutex/RwLock.
  • Future Bounds: The returned future must include + Send for native builds. The #[cfg(target_arch = "wasm32")] implementation in LiteParse drops this bound, allowing the same code to compile for both targets.
  • Bounding Box Format: The bbox field expects [x1, y1, x2, y2] in pixel coordinates matching the rendered page image dimensions passed to recognize.
  • Error Handling: Propagate failures as Box<dyn std::error::Error + Send + Sync>, which LiteParse surfaces as a LiteParseError.
  • Language Option: The OcrOptions struct exposes the language string from LiteParseConfig. Respect this parameter when constructing requests to multilingual OCR backends.

Key Reference Files

Study these existing engines in the run-llama/liteparse repository for production patterns:

Summary

  • Implement the OcrEngine trait from crates/liteparse/src/ocr/mod.rs with name() and recognize() methods.
  • Return a Pin<Box<dyn Future<...>>> with Send bounds for native targets, satisfying the exact lifetime constraints 'b: 'a and 'c: 'a.
  • Register your engine via LiteParse::with_ocr_engine(Arc::new(your_engine)) to override default backends.
  • Ensure thread safety using Send + Sync bounds and thread-safe containers for stateful clients.
  • Format bounding boxes as [x1, y1, x2, y2] pixel coordinates and respect the language field in OcrOptions.

Frequently Asked Questions

What is the exact method signature required for the recognize method?

The recognize method must match the signature in crates/liteparse/src/ocr/mod.rs, including the lifetime bounds 'b: 'a and 'c: 'a to ensure the returned future does not outlive the borrowed image_data and options references. It returns Pin<Box<dyn Future<Output = Result<Vec<OcrResult>, Box<dyn std::error::Error + Send + Sync>>> + Send + '_>> on native platforms, with the Send bound on the future dropped for WebAssembly builds.

How does LiteParse select between the default engine and my custom implementation?

LiteParse checks the ocr_engine_override field (defined in crates/liteparse/src/parser.rs at lines 43-57) inside parse_input. If Some(Arc<dyn OcrEngine>) is present, it uses that engine exclusively. If None, it falls back to built-in logic selecting between HTTP OCR servers and the Tesseract engine based on configuration flags.

Can I use async HTTP clients like reqwest in my custom OCR engine?

Yes. Wrap your reqwest::Client in your engine struct, ensuring the client is Send + Sync. Since recognize returns a pinned future, you can .await async HTTP calls inside the async block while satisfying the trait's thread-safety requirements for native targets.

What image format does the recognize method receive?

The image_data parameter contains raw bytes of the rendered page image, typically in PNG format. The width and height parameters correspond to the pixel dimensions of this image data, allowing you to pass the exact specifications to your OCR backend or perform pre-processing if necessary.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →