how-to-guide

How to Use ResultIterator to Extract Bounding Boxes and Word Confidence in Tesseract OCR

March 2, 2026 tesseract-ocr/tesseract ↗

Use the ResultIterator class obtained via TessBaseAPI::GetIterator() to iterate over OCR results at the word level, calling BoundingBox() for coordinates and Confidence() for recognition probability on each word.

The Tesseract OCR engine provides structured access to recognition results through the ResultIterator class defined in include/tesseract/resultiterator.h. This iterator allows developers to extract precise geometric data and confidence scores for individual words while traversing the document's hierarchical layout. Understanding how to properly initialize and navigate this API is essential for applications requiring fine-grained text analysis beyond simple plain-text extraction.

ResultIterator Architecture and Hierarchy

Tesseract exposes layout-aware results through a three-tier inheritance structure that separates geometry from text processing.

Class inheritance chain:

PageIterator (base class in include/tesseract/pageiterator.h) – Provides hierarchical navigation (page → block → paragraph → line → word → symbol) and geometric helpers like BoundingBox()
LTRResultIterator (middle layer in include/tesseract/ltrresultiterator.h) – Adds text-oriented methods including Confidence() and GetUTF8Text()
ResultIterator (concrete class in include/tesseract/resultiterator.h) – Inherits from LTRResultIterator and implements BiDi (bidirectional) reading order support

The iterator operates on Tesseract's internal PAGE_RES structure. When you call TessBaseAPI::GetIterator() in include/tesseract/baseapi.h, you receive a pointer to a ResultIterator that remains valid only as long as the TessBaseAPI instance exists.

Step-by-Step Workflow for Word Extraction

To extract bounding boxes and confidence scores, follow this sequence:

Initialize the API – Create a TessBaseAPI object and load your language model
Set the image – Pass a Leptonica Pix pointer via SetImage()
Run OCR – Call Recognize() to populate the internal result structure
Obtain the iterator – Use GetIterator() to retrieve a ResultIterator pointer
Iterate at word level – Loop using Next(RIL_WORD) to advance through each word in reading order
Extract data – For each word, call GetUTF8Text(), Confidence(), and BoundingBox()

Complete C++ Implementation Example

The following implementation demonstrates the full workflow for extracting word text, confidence percentages, and pixel coordinates:

#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>
#include <cstdio>

int main() {
  // Initialize the API with English language model
  tesseract::TessBaseAPI api;
  if (api.Init(nullptr, "eng") != 0) {
    fprintf(stderr, "Could not initialize tesseract.\n");
    return 1;
  }

  // Load image (replace with your image path)
  Pix* image = pixRead("sample.png");
  if (!image) {
    fprintf(stderr, "Could not load image.\n");
    return 1;
  }
  api.SetImage(image);

  // Run OCR
  api.Recognize(nullptr);

  // Obtain ResultIterator
  tesseract::ResultIterator* ri = api.GetIterator();
  if (!ri) {
    fprintf(stderr, "Failed to get ResultIterator.\n");
    pixDestroy(&image);
    return 1;
  }

  // Iterate over words
  const tesseract::PageIteratorLevel level = tesseract::RIL_WORD;
  do {
    char* word = ri->GetUTF8Text(level);
    if (word) {
      // Get confidence score (0-100%)
      float conf = ri->Confidence(level);

      // Get bounding box coordinates
      int left, top, right, bottom;
      bool valid_box = ri->BoundingBox(level, &left, &top, &right, &bottom);

      // Output results
      printf("Word: \"%s\" | Confidence: %.2f%% | BBox: (%d,%d)-(%d,%d)\n",
             word, conf,
             valid_box ? left : -1,
             valid_box ? top : -1,
             valid_box ? right : -1,
             valid_box ? bottom : -1);

      delete[] word;  // Critical: free the allocated string
    }
  } while (ri->Next(level));  // Advance to next word

  // Cleanup
  delete ri;
  pixDestroy(&image);
  api.End();
  return 0;
}

Key implementation details:

Memory management – GetUTF8Text() returns a heap-allocated C-string that must be freed with delete[]
Iterator lifecycle – The ResultIterator pointer must be deleted with delete, but it does not own the underlying PAGE_RES data (managed by TessBaseAPI)
Level constant – RIL_WORD targets individual words; alternatives include RIL_SYMBOL, RIL_TEXTLINE, and RIL_PARA

Working with Bounding Boxes and Confidence Scores

Coordinate System and Geometry

The BoundingBox() method, defined in include/tesseract/pageiterator.h, returns pixel coordinates relative to the original input image with the origin (0,0) at the top-left corner. The method signature accepts pointers to four integers:

bool BoundingBox(PageIteratorLevel level, int *left, int *top, int *right, int *bottom) const;

These coordinates represent the tight bounding box around the recognized word. To control whether diacritical marks (dots above or below characters) are included in the calculation, use PageIterator::SetBoundingBoxComponents(bool include_upper_dots, bool include_lower_dots) before iterating.

Alternative: Bulk Confidence Extraction

If you only need confidence values without geometric data or iteration logic, TessBaseAPI::AllWordConfidences() in include/tesseract/baseapi.h provides a shortcut:

int* confidences = api.AllWordConfidences();
// Array contains one int per word, terminated by -1
for (int i = 0; confidences[i] != -1; ++i) {
    printf("Word %d confidence: %d%%\n", i, confidences[i]);
}
delete[] confidences;  // Free the returned array

Thread Safety Considerations

A ResultIterator instance is not thread-safe and must be accessed only from the thread that owns the parent TessBaseAPI object. Do not share iterators across threads without external synchronization.

Summary

ResultIterator inherits from LTRResultIterator and PageIterator, combining text extraction with geometric navigation
Use TessBaseAPI::GetIterator() to obtain an iterator after calling Recognize()
Call Confidence(RIL_WORD) to retrieve recognition probability as a percentage (0-100)
Call BoundingBox(RIL_WORD, ...) to get pixel coordinates in the original image space
Always delete[] the string returned by GetUTF8Text() and delete the iterator itself when finished
For simple confidence arrays without bounding boxes, use AllWordConfidences() instead of iteration

Frequently Asked Questions

What is the difference between ResultIterator and LTRResultIterator?

LTRResultIterator provides left-to-right text processing methods including Confidence() and GetUTF8Text(), while ResultIterator extends it to handle bidirectional (BiDi) text and right-to-left languages. According to the Tesseract source in include/tesseract/resultiterator.h, ResultIterator overrides navigation methods to properly traverse mixed-direction text. For most applications requiring bounding boxes and confidence scores, use the concrete ResultIterator class obtained via GetIterator().

How do I interpret the confidence score returned by ResultIterator?

The Confidence() method returns a float value between 0 and 100 representing the OCR engine's certainty about the recognized text, where higher values indicate greater confidence. As implemented in include/tesseract/ltrresultiterator.h, this score is calculated per hierarchy level (word, symbol, etc.) based on character classifier outputs. Scores below 60-70% often indicate potential recognition errors, though the exact threshold depends on your image quality and language model.

Can I use ResultIterator across multiple threads?

No, ResultIterator is not thread-safe. The iterator maintains internal state pointing into the TessBaseAPI's PAGE_RES structure, and concurrent access from multiple threads causes undefined behavior. If you need parallel processing, create separate TessBaseAPI instances (and their respective iterators) for each thread, or implement external locking mechanisms to serialize access to the iterator.

What coordinate system does BoundingBox use?

The coordinates returned by BoundingBox() are expressed in pixel units relative to the original input image with (0,0) at the top-left corner, as documented in include/tesseract/pageiterator.h. If you called SetRectangle() to restrict OCR to a region of the image, the bounding box coordinates are still relative to the full image dimensions, not the subset rectangle. The method returns left, top, right, and bottom integers that form an inclusive bounding rectangle around the text element.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how tesseract-ocr/tesseract works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →