How to Implement a Custom OCR Engine in LiteParse with the OcrEngine Trait
Implement the OcrEngine trait defined in crates/liteparse/src/ocr/mod.rs by providing name() and recognize() methods, then register your engine via LiteParse::with_ocr_engine using Arc<dyn OcrEngine> to override the default HTTP or Tesseract backend.
LiteParse, the document parsing library from the run-llama ecosystem, abstracts OCR functionality behind a trait-based interface that supports both native and WebAssembly targets. By implementing the OcrEngine trait, you can integrate custom backends—from proprietary cloud APIs to on-device machine learning models—while preserving the library's core parsing pipeline and configuration system.
Understanding the OcrEngine Trait
The OcrEngine trait is defined in crates/liteparse/src/ocr/mod.rs and serves as the abstraction layer for all OCR operations. The trait requires Send + Sync bounds, but the future returned by recognize has platform-specific constraints:
- On native targets (
#[cfg(not(target_arch = "wasm32"))]), the future must beSendto allow the async runtime to move it across threads. - On WebAssembly, the trait remains
Send + Sync, but the future does not require theSendbound because the runtime is single-threaded.
// crates/liteparse/src/ocr/mod.rs
pub trait OcrEngine: Send + Sync {
fn name(&self) -> &str;
fn recognize<'a, 'b: 'a, 'c: 'a>(
&'a self,
image_data: &'c [u8],
width: u32,
height: u32,
options: &'b OcrOptions,
) -> Pin<
Box<
dyn Future<
Output = Result<Vec<OcrResult>, Box<dyn std::error::Error + Send + Sync>>
> + Send + '_,
>,
>;
}
The recognize method receives raw image bytes (typically PNG), pixel dimensions, and OcrOptions, returning a pinned future that resolves to a Vec<OcrResult> or a boxed error.
Step-by-Step Implementation Guide
Follow these steps to create a production-ready custom OCR engine in LiteParse:
- Create a module inside
crates/liteparse/src/ocr/(e.g.,my_ocr.rs) and importOcrEngine,OcrOptions, andOcrResult. - Define a struct holding your engine's configuration, such as API endpoints or model handles.
- Implement
OcrEnginefor your struct, providing:name(): Returns a static string identifier for logging.recognize(): Returns a pinned future containing your async OCR logic.
- Ensure thread safety by using
Send + Synctypes for native targets; wrap mutable state inArc<Mutex<T>>if necessary. - Expose a constructor (typically
new()) that initializes the engine.
Complete Custom Engine Example
Here is a minimal "Echo" engine implementation that demonstrates the trait contract without external dependencies:
// crates/liteparse/src/ocr/my_ocr.rs
use super::{OcrEngine, OcrOptions, OcrResult};
use std::future::Future;
use std::pin::Pin;
/// A very simple "echo" OCR engine used for demonstration.
pub struct EchoEngine;
impl EchoEngine {
pub fn new() -> Self {
EchoEngine
}
}
impl OcrEngine for EchoEngine {
fn name(&self) -> &str {
"echo"
}
fn recognize<'a, 'b: 'a, 'c: 'a>(
&'a self,
_image_data: &'c [u8],
_width: u32,
_height: u32,
options: &'b OcrOptions,
) -> Pin<Box<dyn Future<Output = Result<Vec<OcrResult>, Box<dyn std::error::Error + Send + Sync>>> + Send + '_>>
{
// In a real engine you would send the image to an OCR service here.
// This placeholder just returns a single word containing the requested language.
Box::pin(async move {
Ok(vec![OcrResult {
text: format!("language={}", options.language),
bbox: [0.0, 0.0, 100.0, 20.0],
confidence: 1.0,
}])
})
}
}
This example returns a single result containing the requested language code, illustrating the expected return format: text (extracted string), bbox ([x1, y1, x2, y2] in pixel coordinates), and confidence (0.0 to 1.0).
Wiring Your Engine into LiteParse
The LiteParse struct stores the custom engine in the ocr_engine_override field and selects it inside parse_input (see crates/liteparse/src/parser.rs, lines 43-57 and 71-78). Use the with_ocr_engine method to inject your implementation:
use liteparse::parser::LiteParse;
use liteparse::ocr::my_ocr::EchoEngine;
use std::sync::Arc;
// Build the standard configuration
let cfg = liteparse::config::LiteParseConfig::default();
// Create the parser and inject the custom engine
let parser = LiteParse::new(cfg)
.with_ocr_engine(Arc::new(EchoEngine::new()));
// Now `parser.parse_input(...)` will use `EchoEngine` instead of HTTP/Tesseract.
When ocr_engine_override is Some(Arc<dyn OcrEngine>), LiteParse bypasses its default selection logic—which normally chooses between HTTP OCR servers and the built-in Tesseract engine—and delegates all OCR operations to your implementation.
Platform-Specific Considerations
When implementing OcrEngine, account for these architectural requirements:
- Thread Safety: On native platforms, ensure your engine and returned futures are
Send + Sync. Use thread-safe HTTP clients likereqwest::Clientor protect mutable state withMutex/RwLock. - Future Bounds: The returned future must include
+ Sendfor native builds. The#[cfg(target_arch = "wasm32")]implementation in LiteParse drops this bound, allowing the same code to compile for both targets. - Bounding Box Format: The
bboxfield expects[x1, y1, x2, y2]in pixel coordinates matching the rendered page image dimensions passed torecognize. - Error Handling: Propagate failures as
Box<dyn std::error::Error + Send + Sync>, which LiteParse surfaces as aLiteParseError. - Language Option: The
OcrOptionsstruct exposes thelanguagestring fromLiteParseConfig. Respect this parameter when constructing requests to multilingual OCR backends.
Key Reference Files
Study these existing engines in the run-llama/liteparse repository for production patterns:
crates/liteparse/src/ocr/tesseract.rs: Built-in Tesseract wrapper demonstrating local model execution.crates/liteparse/src/ocr/http_simple.rs: HTTP client implementation showing async request handling and error mapping.crates/liteparse/src/ocr/mod.rs: Trait definition andOcrResult/OcrOptionsstruct specifications.crates/liteparse/src/parser.rs: Integration point showing howwith_ocr_enginestores the override at lines 43-57 and howparse_inputselects the engine at lines 71-78.
Summary
- Implement the
OcrEnginetrait fromcrates/liteparse/src/ocr/mod.rswithname()andrecognize()methods. - Return a
Pin<Box<dyn Future<...>>>withSendbounds for native targets, satisfying the exact lifetime constraints'b: 'aand'c: 'a. - Register your engine via
LiteParse::with_ocr_engine(Arc::new(your_engine))to override default backends. - Ensure thread safety using
Send + Syncbounds and thread-safe containers for stateful clients. - Format bounding boxes as
[x1, y1, x2, y2]pixel coordinates and respect thelanguagefield inOcrOptions.
Frequently Asked Questions
What is the exact method signature required for the recognize method?
The recognize method must match the signature in crates/liteparse/src/ocr/mod.rs, including the lifetime bounds 'b: 'a and 'c: 'a to ensure the returned future does not outlive the borrowed image_data and options references. It returns Pin<Box<dyn Future<Output = Result<Vec<OcrResult>, Box<dyn std::error::Error + Send + Sync>>> + Send + '_>> on native platforms, with the Send bound on the future dropped for WebAssembly builds.
How does LiteParse select between the default engine and my custom implementation?
LiteParse checks the ocr_engine_override field (defined in crates/liteparse/src/parser.rs at lines 43-57) inside parse_input. If Some(Arc<dyn OcrEngine>) is present, it uses that engine exclusively. If None, it falls back to built-in logic selecting between HTTP OCR servers and the Tesseract engine based on configuration flags.
Can I use async HTTP clients like reqwest in my custom OCR engine?
Yes. Wrap your reqwest::Client in your engine struct, ensuring the client is Send + Sync. Since recognize returns a pinned future, you can .await async HTTP calls inside the async block while satisfying the trait's thread-safety requirements for native targets.
What image format does the recognize method receive?
The image_data parameter contains raw bytes of the rendered page image, typically in PNG format. The width and height parameters correspond to the pixel dimensions of this image data, allowing you to pass the exact specifications to your OCR backend or perform pre-processing if necessary.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →