# How to Configure the num_workers Parameter for Concurrent OCR Processing in LiteParse

> Optimize LiteParse OCR processing by configuring the num_workers parameter. Easily set concurrent OCR tasks via CLI, Python, Node.js, or Rust for faster document analysis. Learn how now!

- Repository: [LlamaIndex/liteparse](https://github.com/run-llama/liteparse)
- Tags: how-to-guide
- Published: 2026-05-31

---

**Set the `num_workers` parameter in LiteParse to control how many OCR tasks run concurrently via the `--num-workers` CLI flag, the `num_workers` constructor argument in Python/Node.js, or the `LiteParseConfig` struct in Rust, with a default value of `max(1, num_cpus() - 1)`.**

LiteParse, an open-source document parsing library from run-llama/liteparse, uses optical character recognition (OCR) only when native PDF text extraction fails, making the **`num_workers`** parameter essential for controlling CPU-intensive processing concurrency. This setting determines how many pages are processed simultaneously across the library's Rust core, bindings, and deployment targets.

## Understanding the num_workers Parameter

The `num_workers` setting defines the size of the internal worker pool that executes OCR operations when LiteParse encounters scanned pages or image-based content. According to the source code in [`crates/liteparse/src/config.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/config.rs), the `LiteParseConfig` struct exposes this value as a public field:

```rust
pub struct LiteParseConfig {
    pub num_workers: usize,
    // ... additional configuration fields
}

```

When you do not specify a value, LiteParse automatically calculates the default as **`max(1, num_cpus() - 1)`**, reserving one CPU core for system operations while utilizing the remainder for OCR tasks. This logic is implemented in the `default_num_workers()` helper function within lines 59-60 of the configuration file.

## Configuration Methods Across Runtimes

### Rust Core and CLI

In the Rust implementation, you can configure concurrency programmatically or via command-line arguments. The binary entry point in [`crates/liteparse/src/main.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/main.rs) accepts an optional `--num-workers` flag that overrides the default (lines 89-101):

```bash

# Use default (CPU cores - 1)

liteparse input.pdf --output output.json

# Limit to 2 concurrent workers

liteparse input.pdf --output output.json --num-workers 2

```

For library consumers, modify the `LiteParseConfig` struct before instantiating the parser:

```rust
use liteparse::{LiteParse, LiteParseConfig};

let mut cfg = LiteParseConfig::default();
cfg.num_workers = 4;  // Explicit concurrency limit
let parser = LiteParse::new("document.pdf", cfg);
let result = parser.parse().await?;

```

### Python Bindings

The Python API exposes `num_workers` as a constructor parameter in [`crates/liteparse-python/src/lib.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse-python/src/lib.rs) (lines 216-333). Pass an integer to the `LiteParse` class to override the default:

```python
from liteparse import LiteParse

# Default behavior (CPU cores - 1)

parser = LiteParse("input.pdf")
result = parser.parse()

# Custom worker count

parser = LiteParse("input.pdf", num_workers=3)
result = parser.parse()

```

### Node.js Bindings

For JavaScript and TypeScript applications using the N-API bindings defined in [`crates/liteparse-napi/src/types.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse-napi/src/types.rs) (lines 36-40), supply the `num_workers` option in the constructor:

```javascript
const { LiteParse } = require("liteparse");

# Default concurrency

const parser = new LiteParse("input.pdf");

# Constrained processing

const parserLimited = new LiteParse("input.pdf", { num_workers: 2 });

```

### WebAssembly Constraints

When using LiteParse in browser environments via WebAssembly, concurrency is forcibly limited to a single worker regardless of the configuration value. The WASM shim in [`crates/liteparse-wasm/src/lib.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse-wasm/src/lib.rs) (lines 98-100) explicitly sets this restriction due to browser threading limitations and SharedArrayBuffer constraints.

## Internal Concurrency Implementation

The actual limitation of concurrent OCR tasks is enforced in [`crates/liteparse/src/ocr_merge.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/ocr_merge.rs) using a Tokio semaphore initialized with the `num_workers` value (lines 78-84). When the parser in [`crates/liteparse/src/parser.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/parser.rs) (lines 170-182) encounters pages requiring OCR, it acquires permits from this semaphore before spawning OCR tasks, ensuring the system never exceeds the configured parallelism limit.

## Performance Considerations

Adjusting the `num_workers` value involves balancing speed against resource utilization:

- **Higher values** increase throughput on multi-core machines but raise memory consumption and CPU saturation risk. This configuration suits dedicated servers processing large document batches.
- **Lower values** provide predictable resource usage and prevent system thrashing, making them ideal for containerized environments with CPU quotas or CI/CD pipelines.

## Summary

- The `num_workers` parameter controls OCR concurrency in LiteParse, defaulting to `max(1, num_cpus() - 1)` as implemented in [`crates/liteparse/src/config.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/config.rs).
- Configuration varies by interface: `--num-workers` for CLI, constructor arguments for Python and Node.js, and the `LiteParseConfig` struct for Rust.
- The runtime enforces limits via a Tokio semaphore in [`ocr_merge.rs`](https://github.com/run-llama/liteparse/blob/main/ocr_merge.rs) after receiving the value from [`parser.rs`](https://github.com/run-llama/liteparse/blob/main/parser.rs).
- WebAssembly builds ignore this setting and always use a single worker due to browser limitations.
- Adjust this value based on available CPU cores and memory constraints to optimize document processing throughput.

## Frequently Asked Questions

### What is the default value of num_workers in LiteParse?

The default value is automatically calculated as the number of logical CPU cores minus one, with a minimum of one worker. This calculation occurs in [`crates/liteparse/src/config.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/config.rs) and ensures LiteParse leaves one core available for system operations while maximizing OCR throughput.

### Can I use multiple OCR workers in the browser with LiteParse WASM?

No. The WebAssembly build in [`crates/liteparse-wasm/src/lib.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse-wasm/src/lib.rs) hardcodes the worker count to one due to browser threading limitations and the lack of true parallelism in standard WASM environments, regardless of the `num_workers` value passed in configuration.

### Why does LiteParse show high memory usage when I increase num_workers?

Each concurrent OCR worker maintains independent memory buffers for image processing and text recognition. Increasing `num_workers` creates more simultaneous OCR contexts, which multiplies memory consumption proportionally. Reduce this value if you encounter out-of-memory errors in resource-constrained environments.

### How do I check the actual number of workers being used at runtime?

When initializing LiteParse programmatically in Rust, inspect the `config.num_workers` field after construction. For CLI usage, verbose logging modes may expose configuration details, or you can monitor system process activity to observe the actual concurrency level during PDF processing.