# How to Configure LiteParse's `TESSDATA_PREFIX` for Offline OCR Environments

> Configure LiteParse's TESSDATA_PREFIX locally for offline OCR. Enable air-gapped environments by setting the environment variable or using the tessdata_path option for seamless Tesseract integration.

- Repository: [LlamaIndex/liteparse](https://github.com/run-llama/liteparse)
- Tags: how-to-guide
- Published: 2026-05-30

---

**Set the `TESSDATA_PREFIX` environment variable to a local directory containing Tesseract `.traineddata` files, or pass the `tessdata_path` configuration option, to enable OCR in air‑gapped environments without network access.**

LiteParse, the document parsing library from `run‑llama/liteparse`, embeds Tesseract for optical character recognition. By default, Tesseract attempts to download language models from the internet, which fails in offline or secure environments. Configuring the `TESSDATA_PREFIX` environment variable—or the equivalent `tessdata_path` option—ensures the engine loads language data from a local filesystem path.

## How `TESSDATA_PREFIX` Resolution Works in LiteParse

LiteParse resolves the tessdata directory through a three‑step hierarchy implemented in [`crates/liteparse/src/ocr/tesseract.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/ocr/tesseract.rs):

1. **Explicit configuration** – If you supply a `tessdata_path` (via CLI flag, constructor option, or config struct), that path is used immediately.
2. **Environment variable** – If no explicit path is set, LiteParse reads `std::env::var("TESSDATA_PREFIX")`.
3. **Fallback default** – When neither source is present, the library calls `default_tessdata_dir()` from the `tesseract‑rs` crate, which resolves to `~/.tesseract‑rs/tessdata` on Linux or `~/Library/Application Support/tesseract‑rs/tessdata` on macOS.

The chosen directory is passed directly to `TesseractAPI::init(path, language)`. If the directory lacks the requested language’s `.traineddata` file, Tesseract returns an error.

## Preparing Your Local Tessdata Directory

Before running offline OCR, populate your local directory with the required language packs:

- Download `.traineddata` files from the official Tesseract tessdata repository or your organization’s approved mirror.
- Place files in a directory such as `/opt/tessdata` or `C:\tessdata`.
- Verify the directory contains at minimum `eng.traineddata` for English, or the specific language codes you intend to use (e.g., `fra.traineddata` for French).

This setup guarantees **offline safety** (no network calls during OCR) and **predictable versioning** (you control exactly which model versions are active).

## Configuration Methods by Interface

### Shell Environment and CLI

Set the environment variable in your shell before invoking the `lit` CLI:

```bash
export TESSDATA_PREFIX=/opt/tessdata
lit parse document.pdf --ocr-enabled

```

To override the environment variable for a single invocation, use the `--tessdata-path` flag defined in [`crates/liteparse/src/main.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/main.rs):

```bash
lit parse document.pdf --tessdata-path /custom/tessdata --ocr-enabled

```

The CLI flag takes precedence over `TESSDATA_PREFIX`.

### Node.js and TypeScript

When constructing the parser in Node.js or TypeScript, pass `tessdataPath` in the options object:

```typescript
import { LiteParse } from "liteparse";

const parser = new LiteParse({
  ocrEnabled: true,
  ocrLanguage: "fra",
  tessdataPath: "/opt/tessdata"
});

const result = await parser.parse("document.pdf");
console.log(result.text);

```

### Python

In Python, use the `tessdata_path` parameter when instantiating `LiteParse`:

```python
from liteparse import LiteParse

parser = LiteParse(
    ocr_enabled=True,
    ocr_language="eng",
    tessdata_path="/opt/tessdata"
)

result = parser.parse("document.pdf")
print(result.text)

```

### Rust Core

For Rust applications using `liteparse` directly, set the `tessdata_path` field on `LiteParseConfig` from [`crates/liteparse/src/config.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/config.rs):

```rust
use liteparse::{LiteParse, LiteParseConfig, OutputFormat};

let config = LiteParseConfig {
    tessdata_path: Some("/opt/tessdata".to_string()),
    ..Default::default()
};

let parser = LiteParse::new(config);
let result = parser.parse("document.pdf").await?;
println!("{}", result.text);

```

If you omit `tessdata_path`, the library relies on the `TESSDATA_PREFIX` environment variable or the built‑in default.

## Verifying Offline Operation

To confirm your configuration blocks network access:

1. Disconnect from the network or run in an isolated container.
2. Ensure your `TESSDATA_PREFIX` directory contains the target `.traineddata` files.
3. Execute OCR on a scanned PDF.

If the operation succeeds without timeout errors or download attempts, LiteParse is correctly using the offline tessdata store.

## Summary

- **Resolution order**: Explicit `tessdata_path` > `TESSDATA_PREFIX` environment variable > default user directory (`~/.tesseract‑rs/tessdata` or macOS equivalent).
- **Cross‑binding consistency**: The same environment variable works across Rust, Python, Node.js, and CLI interfaces because all bindings delegate to the core logic in [`tesseract.rs`](https://github.com/run-llama/liteparse/blob/main/tesseract.rs).
- **Offline requirement**: The specified directory must contain valid `.traineddata` files for your target languages; otherwise `TesseractAPI::init` returns an initialization error.
- **Override capability**: CLI flags and constructor options always take precedence over environment variables.

## Frequently Asked Questions

### What happens if I don't set `TESSDATA_PREFIX`?

If neither `TESSDATA_PREFIX` nor an explicit `tessdata_path` is provided, LiteParse falls back to the default location provided by the `tesseract‑rs` crate. On Linux this is typically `~/.tesseract‑rs/tessdata`, and on macOS it is `~/Library/Application Support/tesseract‑rs/tessdata`. If this directory lacks the required language data, OCR will fail.

### Where can I download the `.traineddata` files for offline use?

Download language models from the official Tesseract tessdata repository hosted on GitHub, or from your organization’s internal artifact store. Place the files in the directory referenced by `TESSDATA_PREFIX` before running LiteParse.

### Does the CLI `--tessdata-path` flag override the environment variable?

Yes. According to the implementation in [`crates/liteparse/src/main.rs`](https://github.com/run-llama/liteparse/blob/main/crates/liteparse/src/main.rs), the `--tessdata-path` argument takes precedence over the `TESSDATA_PREFIX` environment variable. The hierarchy is: CLI argument > explicit config > environment variable > default directory.

### Is `TESSDATA_PREFIX` supported on Windows?

Yes. LiteParse reads the environment variable using `std::env::var`, which works across all platforms. On Windows, set the variable using `set TESSDATA_PREFIX=C:\path\to\tessdata` in Command Prompt or PowerShell before running your LiteParse application.