# Thread Safety Considerations for TessBaseAPI in Tesseract OCR

> Understand TessBaseAPI thread safety in Tesseract OCR. Learn how independent objects ensure safety and avoid race conditions with SetVariable or ClearPersistentCache.

- Repository: [tesseract-ocr/tesseract](https://github.com/tesseract-ocr/tesseract)
- Tags: internals
- Published: 2026-03-02

---

**TessBaseAPI instances are thread-safe when each thread maintains its own independent object, but calling `SetVariable` or `ClearPersistentCache` creates race conditions due to process-wide shared state.**

The `TessBaseAPI` class in the [tesseract-ocr/tesseract](https://github.com/tesseract-ocr/tesseract) repository enables high-performance parallel OCR processing, yet thread safety depends entirely on how you manage global configuration parameters and static caches. While the modern architecture isolates most engine state per instance, certain legacy parameters remain globally shared for backward compatibility. Understanding these boundary conditions is essential for building reliable multithreaded document processing pipelines.

## What Makes TessBaseAPI Thread-Safe by Design

Independent `TessBaseAPI` objects are fully isolated and safe for concurrent use across multiple threads.

Each instance owns its own **Tesseract engine**, **ImageThresholder**, page-layout structures, and result containers. According to the class documentation in [`src/ccmain/tesseractclass.h`](https://github.com/tesseract-ocr/tesseract/blob/main/src/ccmain/tesseractclass.h), the design explicitly moved all global variables into the `Tesseract` class to enable safe parallel execution: the comments at lines 5-9 state that this architecture makes it "safe to run multiple Tesseracts in different threads in parallel"【^/cache/repos/github.com/tesseract-ocr/tesseract/main/src/ccmain/tesseractclass.h#L5-L9】.

Read-only operations are completely isolated after initialization. Once an instance has been successfully initialized via `Init()`, thread-safe query methods include:

- `Version()`
- `GetInitLanguagesAsString()`
- `GetUTF8Text()`
- `MeanTextConf()`

These methods access only instance-local data and impose no locking requirements on the calling code.

## Global Operations That Break Thread Isolation

Two specific operations violate the instance isolation model and require external synchronization when used in multithreaded applications.

### SetVariable Modifies Process-Wide State

The `SetVariable` method changes parameters in the **classify** and **textord** modules through a process-wide static table. When one thread calls `SetVariable` on any `TessBaseAPI` instance, the new value immediately becomes visible to **all** active instances regardless of which thread created them.

As documented in [`include/tesseract/baseapi.h`](https://github.com/tesseract-ocr/tesseract/blob/main/include/tesseract/baseapi.h) at lines 161-166: "instances are now mostly thread-safe ... **unless** you use `SetVariable` on some of the Params in classify and textord. If you do, then the effect will be to change it for all your instances"【^/cache/repos/github.com/tesseract-ocr/tesseract/main/include/tesseract/baseapi.h#L161-L166】.

### ClearPersistentCache Affects All Instances

The static method `ClearPersistentCache()` clears data shared across every `TessBaseAPI` object in the process. If one thread clears the cache while another thread is actively using data originating from it, the second thread may encounter stale pointers or forced re-loading overhead.

This method is declared in [`include/tesseract/baseapi.h`](https://github.com/tesseract-ocr/tesseract/blob/main/include/tesseract/baseapi.h) at lines 675-676【^/cache/repos/github.com/tesseract-ocr/tesseract/main/include/tesseract/baseapi.h#L675-L676】 and exposed through the C API as `TessBaseAPIClearPersistentCache` in [`include/tesseract/capi.h`](https://github.com/tesseract-ocr/tesseract/blob/main/include/tesseract/capi.h) at line 495【^/cache/repos/github.com/tesseract-ocr/tesseract/main/include/tesseract/capi.h#L495-L495】.

## Safe Multithreading Patterns for TessBaseAPI

Follow these specific patterns to maintain thread safety when scaling OCR across multiple threads:

- **Create independent workers per thread.** Construct a separate `TessBaseAPI` object in each worker thread, call `Init()` once, and restrict all subsequent API calls to that thread only. Never share a single instance pointer across thread boundaries.

- **Avoid runtime configuration changes.** Never call `SetVariable` after `Init()` from a thread that shares the process with other active instances. If you must adjust parameters globally, protect the entire operation with an external mutex and re-initialize each affected instance afterward.

- **Load per-instance configuration before Init.** Use `ReadConfigFile()` to load configuration files **before** calling `Init()`. This loads settings into the local instance without affecting the global parameter tables used by other threads.

- **Synchronize cache clearing.** Call `ClearPersistentCache()` only when you can guarantee no other thread is using a `TessBaseAPI` instance, or guard the call with a process-wide lock to prevent use-after-free scenarios.

- **Respect the End() lifecycle.** After calling `End()`, you may only invoke `Init()` or the few pre-initialization methods documented in the API. Calling recognition methods like `GetUTF8Text()` on an ended instance produces undefined behavior.

## Code Examples

### Basic Multithreaded Usage (Thread-Safe)

Create one `TessBaseAPI` per thread and keep all operations local to that thread:

```cpp
#include <tesseract/baseapi.h>
#include <thread>
#include <vector>
#include <iostream>

void ocr_worker(const std::string &image_path, const std::string &lang) {
  tesseract::TessBaseAPI api;
  if (api.Init(nullptr, lang.c_str()) != 0) {
    std::cerr << "Could not initialize tesseract for " << lang << "\n";
    return;
  }
  api.SetImage(image_path.c_str());
  char *out = api.GetUTF8Text();
  std::cout << "Result (" << lang << "): " << out << "\n";
  delete[] out;
  api.End();
}

int main() {
  std::vector<std::thread> workers;
  workers.emplace_back(ocr_worker, "page1.png", "eng");
  workers.emplace_back(ocr_worker, "page2.png", "deu");
  workers.emplace_back(ocr_worker, "page3.png", "fra");

  for (auto &t : workers) t.join();
  return 0;
}

```

### Unsafe Pattern: Modifying Global Parameters

This code creates a race condition by modifying shared classifier state:

```cpp
void unsafe_change() {
  tesseract::TessBaseAPI api;
  api.Init(nullptr, "eng");
  // DANGER: Modifies global parameter table for all instances
  api.SetVariable("classify_bln_numeric_mode", "1");
}

```

If two threads execute `unsafe_change()` concurrently, both OCR sessions will see the same value, and the last writer wins, potentially causing non-deterministic recognition behavior.

### Protecting Global Changes with a Mutex

When you must change global parameters, serialize access and manage instance lifecycles carefully:

```cpp
std::mutex g_param_mutex;

void safe_change_global(const char *name, const char *value) {
  std::lock_guard<std::mutex> lock(g_param_mutex);
  tesseract::TessBaseAPI dummy;
  dummy.Init(nullptr, "eng");
  dummy.SetVariable(name, value);
  // Recreate other instances here if they need the new value
}

```

### Clearing the Persistent Cache Safely

Protect the static cache clear operation with a process-wide lock:

```cpp
std::mutex g_cache_mutex;

void clear_cache_once() {
  std::lock_guard<std::mutex> lock(g_cache_mutex);
  tesseract::TessBaseAPI::ClearPersistentCache();
}

```

## Key Source Files

Understanding thread safety requires examining these specific files in the tesseract-ocr/tesseract repository:

- **[`include/tesseract/baseapi.h`](https://github.com/tesseract-ocr/tesseract/blob/main/include/tesseract/baseapi.h)** — Declares the `TessBaseAPI` class, contains the critical thread-safety comment block at lines 161-166 regarding `SetVariable`, and declares the static `ClearPersistentCache()` method at lines 675-676.

- **[`src/ccmain/tesseractclass.h`](https://github.com/tesseract-ocr/tesseract/blob/main/src/ccmain/tesseractclass.h)** — Documents the architectural design goal at lines 5-9, explaining that all global variables were moved into the `Tesseract` class to enable parallel thread execution.

- **[`src/ccmain/tesseractclass.cpp`](https://github.com/tesseract-ocr/tesseract/blob/main/src/ccmain/tesseractclass.cpp)** — Implements the `Tesseract` class container; review this to understand which data structures remain process-wide versus per-instance.

- **[`include/tesseract/capi.h`](https://github.com/tesseract-ocr/tesseract/blob/main/include/tesseract/capi.h)** — Exposes C-API equivalents including `TessBaseAPIClearPersistentCache` at line 495 and `TessBaseAPISetVariable`, which share the same thread-safety constraints as their C++ counterparts.

## Summary

- **Independent `TessBaseAPI` instances are thread-safe** when each thread creates, initializes, and uses its own object without sharing pointers.
- **`SetVariable` is not thread-safe** because it modifies global static parameter tables in the classify and textord modules, affecting all instances immediately.
- **`ClearPersistentCache` is a static method** that impacts every active instance and requires external locking to prevent race conditions.
- **Read-only operations** like `GetUTF8Text()` and `Version()` are fully isolated and require no synchronization after successful initialization.
- **Configuration changes** must happen before `Init()` via `ReadConfigFile()`, or be protected by a global mutex if using `SetVariable` is unavoidable.

## Frequently Asked Questions

### Can I share a single TessBaseAPI instance across multiple threads?

No. Each thread must create and manage its own `TessBaseAPI` instance. While the underlying `Tesseract` class encapsulates most state, the `TessBaseAPI` wrapper maintains internal iterators and result buffers that are not synchronized. Sharing one instance across threads without external locking causes data races and corrupted OCR results.

### Why does SetVariable affect all threads even when called on one instance?

The parameters in the **classify** and **textord** modules are stored in process-wide static lookup tables for performance and backward compatibility. When `SetVariable` updates these specific parameters, it writes to shared memory that every `TessBaseAPI` instance reads from. As noted in [`baseapi.h`](https://github.com/tesseract-ocr/tesseract/blob/main/baseapi.h) lines 161-166, this is the primary exception to the otherwise thread-safe design.

### Is it safe to call ClearPersistentCache from a worker thread?

Only if you ensure no other thread is actively using a `TessBaseAPI` instance. Because `ClearPersistentCache()` is a static method that frees shared caches used by all instances, calling it while another thread is performing OCR may cause that thread to access freed memory or force expensive model reloads. Always guard this call with a process-wide mutex or execute it only during single-threaded initialization or shutdown phases.

### What happens if I call methods after End() in a multithreaded context?

Calling methods other than `Init()` on an instance after `End()` produces undefined behavior in any context, but in multithreaded applications it becomes particularly dangerous. The `End()` method releases internal buffers and resets the object state; subsequent calls may crash the thread or corrupt memory in the shared process heap. Always treat `End()` as a terminal operation for that instance's lifecycle.