How to Implement a Custom OCR Engine in LiteParse Using the OcrEngine Trait

Implement the OcrEngine trait defined in crates/liteparse/src/ocr/mod.rs by providing name() and recognize() methods, then inject your engine via LiteParse::with_ocr_engine() to override the default HTTP or Tesseract selection.

LiteParse is a Rust document parsing library that abstracts optical character recognition behind the OcrEngine trait. This architecture enables you to integrate any OCR backend—from cloud APIs to on-device models—without modifying the core parsing logic. By implementing the trait and wiring it through crates/liteparse/src/parser.rs, you maintain full control over text recognition while leveraging LiteParse's document processing pipeline.

Understanding the OcrEngine Trait Interface

The OcrEngine trait is defined in crates/liteparse/src/ocr/mod.rs. It abstracts OCR operations behind a generic interface that works across both native targets and WebAssembly.

On native targets (#[cfg(not(target_arch = "wasm32"))]), the trait requires Send + Sync bounds so the async runtime can move the engine across threads. On WebAssembly, the trait retains Send + Sync requirements, but the returned future does not need to be Send because the runtime is single-threaded.

The complete trait definition requires implementing two methods:

// src/ocr/mod.rs
pub trait OcrEngine: Send + Sync {
    fn name(&self) -> &str;
    
    fn recognize<'a, 'b: 'a, 'c: 'a>(
        &'a self,
        image_data: &'c [u8],
        width: u32,
        height: u32,
        options: &'b OcrOptions,
    ) -> Pin<
        Box<
            dyn Future<
                Output = Result<Vec<OcrResult>, Box<dyn std::error::Error + Send + Sync>>
            > + Send + '_,
        >,
    >;
}

The recognize() method receives raw image bytes (typically PNG), dimensions, and OcrOptions, returning a future that resolves to a vector of OcrResult structs containing text, bounding boxes, and confidence scores.

Step-by-Step Implementation Guide

Follow these steps to create a custom OCR engine compatible with LiteParse's parser.

1. Create a New Module

Create a Rust file inside crates/liteparse/src/ocr/ (e.g., my_ocr.rs) to house your implementation. Add the module declaration to crates/liteparse/src/ocr/mod.rs.

2. Define Your Engine Struct

Define a struct that holds any required state, such as API clients, authentication tokens, or model handles. Ensure the struct implements Send + Sync for thread safety on native platforms.

3. Implement the OcrEngine Trait

Provide implementations for both required methods:

  • name(): Return a static string identifier for debugging and logging purposes.
  • recognize(): Return a pinned boxed future containing the OCR logic. The future must resolve to Result<Vec<OcrResult>, Box<dyn std::error::Error + Send + Sync>>.

4. Provide a Constructor

Expose a new() method or builder pattern that constructs your engine with the necessary configuration.

5. Register with LiteParse

Wrap your engine in std::sync::Arc and pass it to LiteParse::with_ocr_engine() before parsing documents.

Complete Custom Engine Example

Here is a minimal implementation of an "Echo" engine that demonstrates the required structure. In crates/liteparse/src/ocr/my_ocr.rs:

use super::{OcrEngine, OcrOptions, OcrResult};
use std::future::Future;
use std::pin::Pin;

/// A demonstration OCR engine that returns the requested language as text.
pub struct EchoEngine;

impl EchoEngine {
    pub fn new() -> Self {
        EchoEngine
    }
}

impl OcrEngine for EchoEngine {
    fn name(&self) -> &str {
        "echo"
    }

    fn recognize<'a, 'b: 'a, 'c: 'a>(
        &'a self,
        _image_data: &'c [u8],
        _width: u32,
        _height: u32,
        options: &'b OcrOptions,
    ) -> Pin<Box<dyn Future<Output = Result<Vec<OcrResult>, Box<dyn std::error::Error + Send + Sync>>> + Send + '_>>
    {
        Box::pin(async move {
            Ok(vec![OcrResult {
                text: format!("language={}", options.language),
                bbox: [0.0, 0.0, 100.0, 20.0],
                confidence: 1.0,
            }])
        })
    }
}

This example returns a single text result containing the requested language code. In production, you would replace the async block with calls to your actual OCR service or local model.

Wiring Your Engine into the Parser

LiteParse selects OCR engines inside LiteParse::parse_input (lines 43-57 and 71-78 in crates/liteparse/src/parser.rs). By default, it chooses between an HTTP OCR server or the built-in Tesseract engine based on configuration. Override this behavior using with_ocr_engine():

use liteparse::parser::LiteParse;
use liteparse::ocr::my_ocr::EchoEngine;
use std::sync::Arc;

// Build the standard configuration
let cfg = liteparse::config::LiteParseConfig::default();

// Create the parser and inject the custom engine
let parser = LiteParse::new(cfg)
    .with_ocr_engine(Arc::new(EchoEngine::new()));

// Subsequent calls to parser.parse_input(...) will use EchoEngine

The with_ocr_engine() method stores your engine in the ocr_engine_override field of the LiteParse struct, bypassing the default selection logic entirely.

Platform-Specific Considerations and Best Practices

When implementing OcrEngine according to the run-llama/liteparse source code, account for these architectural constraints:

Thread-Safety Requirements: On native platforms, ensure your engine and any internal HTTP clients (such as reqwest::Client) are Send + Sync. Protect mutable state with Mutex or RwLock if necessary.

Future Send Bounds: The boxed future returned by recognize() must include + Send for native builds. The #[cfg(target_arch = "wasm32")] implementation drops this bound automatically, allowing the same code to compile for both targets.

Bounding Box Format: The bbox field in OcrResult expects [x1, y1, x2, y2] coordinates in pixel units relative to the input image dimensions. Ensure your OCR backend returns coordinates that match the width and height parameters passed to recognize().

Error Handling: Propagate failures as Box<dyn std::error::Error + Send + Sync>. LiteParse surfaces these as LiteParseError variants, so include descriptive error messages for debugging.

Language Configuration: Access the requested language through options.language (populated from LiteParseConfig.ocr_language). Respect this setting when constructing requests to external OCR APIs.

Reference Implementations: Study crates/liteparse/src/ocr/tesseract.rs for a local OCR implementation and crates/liteparse/src/ocr/http_simple.rs for a cloud-based HTTP client pattern.

Summary

  • Implement OcrEngine in crates/liteparse/src/ocr/mod.rs by defining name() and the async recognize() method with proper lifetime bounds.
  • Ensure thread-safety with Send + Sync bounds on native targets, using + Send on the returned future.
  • Inject custom engines via LiteParse::with_ocr_engine() in crates/liteparse/src/ocr/parser.rs to override default HTTP or Tesseract selection.
  • Return structured results using OcrResult with pixel-coordinate bounding boxes and confidence scores.
  • Study built-in implementations in tesseract.rs and http_simple.rs for production-ready patterns.

Frequently Asked Questions

What is the exact return type required for the recognize() method?

The recognize() method must return Pin<Box<dyn Future<Output = Result<Vec<OcrResult>, Box<dyn std::error::Error + Send + Sync>>> + Send + '_>> on native targets. This pinned boxed future must resolve to a vector of OcrResult structs or a boxed error trait object. The + Send bound is required for native platforms but omitted for WebAssembly builds.

How do I handle platform differences between native and WASM targets?

The OcrEngine trait always requires Send + Sync for the engine itself. However, on native targets (#[cfg(not(target_arch = "wasm32"))]), the returned future must also be Send. The LiteParse codebase uses conditional compilation to handle this difference automatically, so your implementation should compile for both targets without modification if you include the + Send bound on the future.

Can I use async/await syntax inside my recognize() implementation?

Yes. Use Box::pin(async move { ... }) to convert an async block into the required pinned boxed future type. Inside the async block, you can use .await to call other async functions, such as HTTP requests to OCR cloud services. Ensure any awaited futures are also Send when compiling for native targets.

How does LiteParse select my custom engine over the built-in options?

The LiteParse struct stores your engine in the ocr_engine_override field (defined in crates/liteparse/src/parser.rs). When this field contains Some(Arc<dyn OcrEngine>), the parse_input method skips the default selection logic (which chooses between HTTP OCR and Tesseract) and uses your provided engine exclusively. Always wrap your engine in Arc::new() before passing it to with_ocr_engine().

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →