# How to Set Up an HTTP OCR Server and Integrate It with LiteParse

> Set up an HTTP OCR server and integrate with LiteParse. Configure LiteParseConfig's ocr_server_url to use HttpOcrEngine for image recognition, bypassing local Tesseract.

- Repository: [LlamaIndex/liteparse](https://github.com/run-llama/liteparse)
- Tags: how-to-guide
- Published: 2026-05-31

---

**To set up an HTTP OCR server with LiteParse, configure the `ocr_server_url` field in `LiteParseConfig` to point to any endpoint implementing the LiteParse OCR API Specification, and LiteParse will automatically route image recognition through `HttpOcrEngine` instead of local Tesseract.**

Setting up an HTTP OCR server allows LiteParse to offload optical character recognition to specialized external services rather than relying solely on local Tesseract installations. This approach decouples OCR processing from the core parsing engine, enabling GPU-accelerated recognition, custom model hosting, or language-specific optimizations. According to the run-llama/liteparse source code, integration requires configuring the `ocr_server_url` field and ensuring your endpoint conforms to the specification defined in [`OCR_API_SPEC.md`](https://github.com/run-llama/liteparse/blob/main/OCR_API_SPEC.md).

## Architecture Overview

LiteParse selects between local and remote OCR engines based solely on configuration. When you set up an HTTP OCR server and provide its URL, the parser instantiates `HttpOcrEngine` and streams base64-encoded PNG images to your service.

### Configuration Layer

The integration entry point is `LiteParseConfig` in [`crates/liteparse/src/config.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/config.rs). This struct contains the optional `ocr_server_url` field (defaulting to **None**) that triggers HTTP mode when populated:

```rust
pub struct LiteParseConfig {
    pub ocr_enabled: bool,
    pub ocr_server_url: Option<String>,  // HTTP endpoint trigger
    pub ocr_language: String,
    // ... additional fields
}

```

When `ocr_server_url` is present, LiteParse ignores local Tesseract bindings and prepares for network-based recognition.

### Engine Selection Logic

In [`crates/liteparse/src/parser.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/parser.rs), the `LiteParse::parse_input` method implements the selection logic:

```rust
let ocr_engine: Arc<dyn OcrEngine> = if let Some(ref url) = self.config.ocr_server_url {
    std::sync::Arc::new(HttpOcrEngine::new(url.clone()))
} else {
    // Fallback to Tesseract when tesseract feature is enabled
    std::sync::Arc::new(TesseractOcrEngine::new())
};

```

This conditional instantiation determines whether the parser uses `HttpOcrEngine` or local processing for every PDF page requiring OCR.

### HTTP OCR Engine Implementation

`HttpOcrEngine` (defined in [`crates/liteparse/src/ocr/http_simple.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/ocr/http_simple.rs)) implements the `OcrEngine` trait. Its `recognize` method performs three critical operations:

1. Converts raw RGB page renders to PNG format
2. Constructs a multipart/form-data POST request to the configured `ocr_server_url`
3. Deserializes the JSON response into `OcrResult` objects containing text, bounding boxes, and confidence scores

The engine handles HTTP client lifecycle management and error propagation, ensuring failed requests do not crash the parsing pipeline.

## OCR API Contract

Any server you set up must implement the **LiteParse OCR API Specification** documented in [`OCR_API_SPEC.md`](https://github.com/run-llama/liteparse/blob/main/OCR_API_SPEC.md). The contract requires:

- **Endpoint**: `POST /ocr`
- **Request**: Multipart form with a binary `file` field (PNG image) and optional `language` string
- **Response**: JSON object with a `results` array containing objects with `text`, `bbox` (array of four floats), and `confidence` (float between 0 and 1)

Example response structure:

```json
{
  "results": [
    { 
      "text": "Extracted text", 
      "bbox": [10.5, 20.0, 150.5, 40.0], 
      "confidence": 0.97 
    }
  ]
}

```

Bounding boxes use the format `[x1, y1, x2, y2]` representing top-left and bottom-right coordinates.

## Reference Server Implementations

The repository provides two production-ready HTTP OCR servers that conform to the specification out of the box.

### EasyOCR Server

Located at [`ocr/easyocr/server.py`](https://github.com/run-llama/liteparse/blob/main/ocr/easyocr/server.py), this FastAPI wrapper exposes the EasyOCR library via HTTP:

```bash
cd ocr/easyocr
uv run server.py  # Starts on http://0.0.0.0:8828

```

The server automatically detects available GPUs and includes a health check endpoint at `GET /health` for load balancer integration.

### PaddleOCR Server

The PaddleOCR implementation in [`ocr/paddleocr/server.py`](https://github.com/run-llama/liteparse/blob/main/ocr/paddleocr/server.py) offers alternative recognition models optimized for multilingual documents:

```bash
cd ocr/paddleocr
uv run server.py  # Starts on http://0.0.0.0:8829

```

Enable GPU acceleration by modifying `PaddleOCRServer.__init__` to set `use_gpu=True` before starting the server.

## Integration Methods

Once your HTTP OCR server is running, integrate it with LiteParse using either CLI flags or programmatic configuration.

### Command Line Interface

Pass the server URL via the `--ocr-server-url` flag:

```bash

# Using EasyOCR server

lit parse document.pdf --ocr-server-url http://localhost:8828/ocr

# Using PaddleOCR with Chinese language support

lit parse document.pdf --ocr-server-url http://localhost:8829/ocr --ocr-language zh

```

The `--ocr-language` parameter maps to the `language` form field sent in the multipart request.

### Node.js / TypeScript SDK

Configure the `LiteParse` class with the `ocrServerUrl` option:

```typescript
import { LiteParse } from 'liteparse';

const parser = new LiteParse({
  ocrServerUrl: 'http://localhost:8828/ocr',
  ocrLanguage: 'en',
});

const result = await parser.parse('document.pdf');
console.log(result.text);

```

### Rust Library

Instantiate `LiteParseConfig` with the `ocr_server_url` field:

```rust
use liteparse::{LiteParse, LiteParseConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let cfg = LiteParseConfig {
        ocr_enabled: true,
        ocr_server_url: Some("http://localhost:8828/ocr".into()),
        ..Default::default()
    };
    
    let parser = LiteParse::new(cfg);
    let res = parser.parse_input(
        liteparse::PdfInput::Path("document.pdf".into())
    ).await?;
    
    println!("{}", res.text);
    Ok(())
}

```

## Building a Custom HTTP OCR Server

If EasyOCR or PaddleOCR do not meet your requirements, implement a minimal compliant server using any framework. Here is a FastAPI skeleton:

```python
from fastapi import FastAPI, File, Form, UploadFile
from pydantic import BaseModel
import your_custom_ocr_library

app = FastAPI()

class OcrResponse(BaseModel):
    results: list[dict]

@app.post("/ocr")
async def ocr_endpoint(
    file: UploadFile = File(...), 
    language: str = Form(default="en")
) -> OcrResponse:
    image_bytes = await file.read()
    
    # Your OCR logic here

    ocr_data = your_custom_ocr_library.process(image_bytes, language)
    
    return OcrResponse(results=[{
        "text": ocr_data.text,
        "bbox": ocr_data.bbox,
        "confidence": ocr_data.confidence
    }])

```

Deploy this server and point LiteParse to it using the configuration methods described above. The `HttpOcrEngine` will handle image encoding and result parsing automatically.

## Summary

- **Configuration**: Set `ocr_server_url` in `LiteParseConfig` (located in [`crates/liteparse/src/config.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/config.rs)) to enable HTTP mode
- **Engine Selection**: `LiteParse::parse_input` in [`crates/liteparse/src/parser.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/parser.rs) automatically chooses `HttpOcrEngine` when a URL is provided
- **Implementation**: `HttpOcrEngine` in [`crates/liteparse/src/ocr/http_simple.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/ocr/http_simple.rs) sends PNG images as multipart/form-data to your endpoint
- **Compliance**: Servers must implement the contract in [`OCR_API_SPEC.md`](https://github.com/run-llama/liteparse/blob/main/OCR_API_SPEC.md), specifically the `POST /ocr` endpoint returning `text`, `bbox`, and `confidence`
- **Reference**: Use [`ocr/easyocr/server.py`](https://github.com/run-llama/liteparse/blob/main/ocr/easyocr/server.py) or [`ocr/paddleocr/server.py`](https://github.com/run-llama/liteparse/blob/main/ocr/paddleocr/server.py) as working implementations
- **Integration**: Works via CLI flag `--ocr-server-url`, or programmatically in Rust and Node.js through the configuration structs

## Frequently Asked Questions

### What image format does LiteParse send to the HTTP OCR server?

LiteParse converts each rendered PDF page to a **PNG image** before transmission. The `HttpOcrEngine::recognize` method in [`crates/liteparse/src/ocr/http_simple.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/ocr/http_simple.rs) handles the RGB-to-PNG conversion automatically, ensuring consistent input format regardless of PDF source.

### Can I use GPU acceleration with the HTTP OCR server?

Yes. Both reference implementations support GPU acceleration. In the PaddleOCR server ([`ocr/paddleocr/server.py`](https://github.com/run-llama/liteparse/blob/main/ocr/paddleocr/server.py)), set `use_gpu=True` in the `PaddleOCRServer` constructor. For EasyOCR ([`ocr/easyocr/server.py`](https://github.com/run-llama/liteparse/blob/main/ocr/easyocr/server.py)), the underlying library automatically detects CUDA devices. Custom servers can leverage any hardware acceleration available to their OCR backend.

### How does LiteParse handle HTTP OCR server failures?

When `HttpOcrEngine` encounters network errors or non-200 HTTP responses, it propagates the error through the `OcrEngine` trait implementation. The parser treats these as fatal errors for the specific page being processed. You should implement retry logic and health checks (the reference servers provide `GET /health`) to minimize downtime.

### Is the HTTP OCR mode slower than local Tesseract?

Network latency adds overhead proportional to image size and round-trip time. However, HTTP mode enables GPU-accelerated OCR engines (like PaddleOCR) that often outperform CPU-bound Tesseract on complex documents. For high-throughput scenarios, deploy the OCR server on the same LAN or use connection pooling in your server implementation.