How to Set Up an HTTP OCR Server and Integrate It with LiteParse

To set up an HTTP OCR server with LiteParse, configure the ocr_server_url field in LiteParseConfig to point to any endpoint implementing the LiteParse OCR API Specification, and LiteParse will automatically route image recognition through HttpOcrEngine instead of local Tesseract.

Setting up an HTTP OCR server allows LiteParse to offload optical character recognition to specialized external services rather than relying solely on local Tesseract installations. This approach decouples OCR processing from the core parsing engine, enabling GPU-accelerated recognition, custom model hosting, or language-specific optimizations. According to the run-llama/liteparse source code, integration requires configuring the ocr_server_url field and ensuring your endpoint conforms to the specification defined in OCR_API_SPEC.md.

Architecture Overview

LiteParse selects between local and remote OCR engines based solely on configuration. When you set up an HTTP OCR server and provide its URL, the parser instantiates HttpOcrEngine and streams base64-encoded PNG images to your service.

Configuration Layer

The integration entry point is LiteParseConfig in crates/liteparse/src/config.rs. This struct contains the optional ocr_server_url field (defaulting to None) that triggers HTTP mode when populated:

pub struct LiteParseConfig {
    pub ocr_enabled: bool,
    pub ocr_server_url: Option<String>,  // HTTP endpoint trigger
    pub ocr_language: String,
    // ... additional fields
}

When ocr_server_url is present, LiteParse ignores local Tesseract bindings and prepares for network-based recognition.

Engine Selection Logic

In crates/liteparse/src/parser.rs, the LiteParse::parse_input method implements the selection logic:

let ocr_engine: Arc<dyn OcrEngine> = if let Some(ref url) = self.config.ocr_server_url {
    std::sync::Arc::new(HttpOcrEngine::new(url.clone()))
} else {
    // Fallback to Tesseract when tesseract feature is enabled
    std::sync::Arc::new(TesseractOcrEngine::new())
};

This conditional instantiation determines whether the parser uses HttpOcrEngine or local processing for every PDF page requiring OCR.

HTTP OCR Engine Implementation

HttpOcrEngine (defined in crates/liteparse/src/ocr/http_simple.rs) implements the OcrEngine trait. Its recognize method performs three critical operations:

  1. Converts raw RGB page renders to PNG format
  2. Constructs a multipart/form-data POST request to the configured ocr_server_url
  3. Deserializes the JSON response into OcrResult objects containing text, bounding boxes, and confidence scores

The engine handles HTTP client lifecycle management and error propagation, ensuring failed requests do not crash the parsing pipeline.

OCR API Contract

Any server you set up must implement the LiteParse OCR API Specification documented in OCR_API_SPEC.md. The contract requires:

  • Endpoint: POST /ocr
  • Request: Multipart form with a binary file field (PNG image) and optional language string
  • Response: JSON object with a results array containing objects with text, bbox (array of four floats), and confidence (float between 0 and 1)

Example response structure:

{
  "results": [
    { 
      "text": "Extracted text", 
      "bbox": [10.5, 20.0, 150.5, 40.0], 
      "confidence": 0.97 
    }
  ]
}

Bounding boxes use the format [x1, y1, x2, y2] representing top-left and bottom-right coordinates.

Reference Server Implementations

The repository provides two production-ready HTTP OCR servers that conform to the specification out of the box.

EasyOCR Server

Located at ocr/easyocr/server.py, this FastAPI wrapper exposes the EasyOCR library via HTTP:

cd ocr/easyocr
uv run server.py  # Starts on http://0.0.0.0:8828

The server automatically detects available GPUs and includes a health check endpoint at GET /health for load balancer integration.

PaddleOCR Server

The PaddleOCR implementation in ocr/paddleocr/server.py offers alternative recognition models optimized for multilingual documents:

cd ocr/paddleocr
uv run server.py  # Starts on http://0.0.0.0:8829

Enable GPU acceleration by modifying PaddleOCRServer.__init__ to set use_gpu=True before starting the server.

Integration Methods

Once your HTTP OCR server is running, integrate it with LiteParse using either CLI flags or programmatic configuration.

Command Line Interface

Pass the server URL via the --ocr-server-url flag:


# Using EasyOCR server

lit parse document.pdf --ocr-server-url http://localhost:8828/ocr

# Using PaddleOCR with Chinese language support

lit parse document.pdf --ocr-server-url http://localhost:8829/ocr --ocr-language zh

The --ocr-language parameter maps to the language form field sent in the multipart request.

Node.js / TypeScript SDK

Configure the LiteParse class with the ocrServerUrl option:

import { LiteParse } from 'liteparse';

const parser = new LiteParse({
  ocrServerUrl: 'http://localhost:8828/ocr',
  ocrLanguage: 'en',
});

const result = await parser.parse('document.pdf');
console.log(result.text);

Rust Library

Instantiate LiteParseConfig with the ocr_server_url field:

use liteparse::{LiteParse, LiteParseConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let cfg = LiteParseConfig {
        ocr_enabled: true,
        ocr_server_url: Some("http://localhost:8828/ocr".into()),
        ..Default::default()
    };
    
    let parser = LiteParse::new(cfg);
    let res = parser.parse_input(
        liteparse::PdfInput::Path("document.pdf".into())
    ).await?;
    
    println!("{}", res.text);
    Ok(())
}

Building a Custom HTTP OCR Server

If EasyOCR or PaddleOCR do not meet your requirements, implement a minimal compliant server using any framework. Here is a FastAPI skeleton:

from fastapi import FastAPI, File, Form, UploadFile
from pydantic import BaseModel
import your_custom_ocr_library

app = FastAPI()

class OcrResponse(BaseModel):
    results: list[dict]

@app.post("/ocr")
async def ocr_endpoint(
    file: UploadFile = File(...), 
    language: str = Form(default="en")
) -> OcrResponse:
    image_bytes = await file.read()
    
    # Your OCR logic here

    ocr_data = your_custom_ocr_library.process(image_bytes, language)
    
    return OcrResponse(results=[{
        "text": ocr_data.text,
        "bbox": ocr_data.bbox,
        "confidence": ocr_data.confidence
    }])

Deploy this server and point LiteParse to it using the configuration methods described above. The HttpOcrEngine will handle image encoding and result parsing automatically.

Summary

Frequently Asked Questions

What image format does LiteParse send to the HTTP OCR server?

LiteParse converts each rendered PDF page to a PNG image before transmission. The HttpOcrEngine::recognize method in crates/liteparse/src/ocr/http_simple.rs handles the RGB-to-PNG conversion automatically, ensuring consistent input format regardless of PDF source.

Can I use GPU acceleration with the HTTP OCR server?

Yes. Both reference implementations support GPU acceleration. In the PaddleOCR server (ocr/paddleocr/server.py), set use_gpu=True in the PaddleOCRServer constructor. For EasyOCR (ocr/easyocr/server.py), the underlying library automatically detects CUDA devices. Custom servers can leverage any hardware acceleration available to their OCR backend.

How does LiteParse handle HTTP OCR server failures?

When HttpOcrEngine encounters network errors or non-200 HTTP responses, it propagates the error through the OcrEngine trait implementation. The parser treats these as fatal errors for the specific page being processed. You should implement retry logic and health checks (the reference servers provide GET /health) to minimize downtime.

Is the HTTP OCR mode slower than local Tesseract?

Network latency adds overhead proportional to image size and round-trip time. However, HTTP mode enables GPU-accelerated OCR engines (like PaddleOCR) that often outperform CPU-bound Tesseract on complex documents. For high-throughput scenarios, deploy the OCR server on the same LAN or use connection pooling in your server implementation.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →