How to Use ResultIterator to Extract Bounding Boxes and Word Confidence in Tesseract OCR
Use the ResultIterator class obtained via TessBaseAPI::GetIterator() to iterate over OCR results at the word level, calling BoundingBox() for coordinates and Confidence() for recognition probability on each word.
The Tesseract OCR engine provides structured access to recognition results through the ResultIterator class defined in include/tesseract/resultiterator.h. This iterator allows developers to extract precise geometric data and confidence scores for individual words while traversing the document's hierarchical layout. Understanding how to properly initialize and navigate this API is essential for applications requiring fine-grained text analysis beyond simple plain-text extraction.
ResultIterator Architecture and Hierarchy
Tesseract exposes layout-aware results through a three-tier inheritance structure that separates geometry from text processing.
Class inheritance chain:
PageIterator(base class ininclude/tesseract/pageiterator.h) – Provides hierarchical navigation (page → block → paragraph → line → word → symbol) and geometric helpers likeBoundingBox()LTRResultIterator(middle layer ininclude/tesseract/ltrresultiterator.h) – Adds text-oriented methods includingConfidence()andGetUTF8Text()ResultIterator(concrete class ininclude/tesseract/resultiterator.h) – Inherits fromLTRResultIteratorand implements BiDi (bidirectional) reading order support
The iterator operates on Tesseract's internal PAGE_RES structure. When you call TessBaseAPI::GetIterator() in include/tesseract/baseapi.h, you receive a pointer to a ResultIterator that remains valid only as long as the TessBaseAPI instance exists.
Step-by-Step Workflow for Word Extraction
To extract bounding boxes and confidence scores, follow this sequence:
- Initialize the API – Create a
TessBaseAPIobject and load your language model - Set the image – Pass a Leptonica
Pixpointer viaSetImage() - Run OCR – Call
Recognize()to populate the internal result structure - Obtain the iterator – Use
GetIterator()to retrieve aResultIteratorpointer - Iterate at word level – Loop using
Next(RIL_WORD)to advance through each word in reading order - Extract data – For each word, call
GetUTF8Text(),Confidence(), andBoundingBox()
Complete C++ Implementation Example
The following implementation demonstrates the full workflow for extracting word text, confidence percentages, and pixel coordinates:
#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>
#include <cstdio>
int main() {
// Initialize the API with English language model
tesseract::TessBaseAPI api;
if (api.Init(nullptr, "eng") != 0) {
fprintf(stderr, "Could not initialize tesseract.\n");
return 1;
}
// Load image (replace with your image path)
Pix* image = pixRead("sample.png");
if (!image) {
fprintf(stderr, "Could not load image.\n");
return 1;
}
api.SetImage(image);
// Run OCR
api.Recognize(nullptr);
// Obtain ResultIterator
tesseract::ResultIterator* ri = api.GetIterator();
if (!ri) {
fprintf(stderr, "Failed to get ResultIterator.\n");
pixDestroy(&image);
return 1;
}
// Iterate over words
const tesseract::PageIteratorLevel level = tesseract::RIL_WORD;
do {
char* word = ri->GetUTF8Text(level);
if (word) {
// Get confidence score (0-100%)
float conf = ri->Confidence(level);
// Get bounding box coordinates
int left, top, right, bottom;
bool valid_box = ri->BoundingBox(level, &left, &top, &right, &bottom);
// Output results
printf("Word: \"%s\" | Confidence: %.2f%% | BBox: (%d,%d)-(%d,%d)\n",
word, conf,
valid_box ? left : -1,
valid_box ? top : -1,
valid_box ? right : -1,
valid_box ? bottom : -1);
delete[] word; // Critical: free the allocated string
}
} while (ri->Next(level)); // Advance to next word
// Cleanup
delete ri;
pixDestroy(&image);
api.End();
return 0;
}
Key implementation details:
- Memory management –
GetUTF8Text()returns a heap-allocated C-string that must be freed withdelete[] - Iterator lifecycle – The
ResultIteratorpointer must be deleted withdelete, but it does not own the underlyingPAGE_RESdata (managed byTessBaseAPI) - Level constant –
RIL_WORDtargets individual words; alternatives includeRIL_SYMBOL,RIL_TEXTLINE, andRIL_PARA
Working with Bounding Boxes and Confidence Scores
Coordinate System and Geometry
The BoundingBox() method, defined in include/tesseract/pageiterator.h, returns pixel coordinates relative to the original input image with the origin (0,0) at the top-left corner. The method signature accepts pointers to four integers:
bool BoundingBox(PageIteratorLevel level, int *left, int *top, int *right, int *bottom) const;
These coordinates represent the tight bounding box around the recognized word. To control whether diacritical marks (dots above or below characters) are included in the calculation, use PageIterator::SetBoundingBoxComponents(bool include_upper_dots, bool include_lower_dots) before iterating.
Alternative: Bulk Confidence Extraction
If you only need confidence values without geometric data or iteration logic, TessBaseAPI::AllWordConfidences() in include/tesseract/baseapi.h provides a shortcut:
int* confidences = api.AllWordConfidences();
// Array contains one int per word, terminated by -1
for (int i = 0; confidences[i] != -1; ++i) {
printf("Word %d confidence: %d%%\n", i, confidences[i]);
}
delete[] confidences; // Free the returned array
Thread Safety Considerations
A ResultIterator instance is not thread-safe and must be accessed only from the thread that owns the parent TessBaseAPI object. Do not share iterators across threads without external synchronization.
Summary
ResultIteratorinherits fromLTRResultIteratorandPageIterator, combining text extraction with geometric navigation- Use
TessBaseAPI::GetIterator()to obtain an iterator after callingRecognize() - Call
Confidence(RIL_WORD)to retrieve recognition probability as a percentage (0-100) - Call
BoundingBox(RIL_WORD, ...)to get pixel coordinates in the original image space - Always
delete[]the string returned byGetUTF8Text()anddeletethe iterator itself when finished - For simple confidence arrays without bounding boxes, use
AllWordConfidences()instead of iteration
Frequently Asked Questions
What is the difference between ResultIterator and LTRResultIterator?
LTRResultIterator provides left-to-right text processing methods including Confidence() and GetUTF8Text(), while ResultIterator extends it to handle bidirectional (BiDi) text and right-to-left languages. According to the Tesseract source in include/tesseract/resultiterator.h, ResultIterator overrides navigation methods to properly traverse mixed-direction text. For most applications requiring bounding boxes and confidence scores, use the concrete ResultIterator class obtained via GetIterator().
How do I interpret the confidence score returned by ResultIterator?
The Confidence() method returns a float value between 0 and 100 representing the OCR engine's certainty about the recognized text, where higher values indicate greater confidence. As implemented in include/tesseract/ltrresultiterator.h, this score is calculated per hierarchy level (word, symbol, etc.) based on character classifier outputs. Scores below 60-70% often indicate potential recognition errors, though the exact threshold depends on your image quality and language model.
Can I use ResultIterator across multiple threads?
No, ResultIterator is not thread-safe. The iterator maintains internal state pointing into the TessBaseAPI's PAGE_RES structure, and concurrent access from multiple threads causes undefined behavior. If you need parallel processing, create separate TessBaseAPI instances (and their respective iterators) for each thread, or implement external locking mechanisms to serialize access to the iterator.
What coordinate system does BoundingBox use?
The coordinates returned by BoundingBox() are expressed in pixel units relative to the original input image with (0,0) at the top-left corner, as documented in include/tesseract/pageiterator.h. If you called SetRectangle() to restrict OCR to a region of the image, the bounding box coordinates are still relative to the full image dimensions, not the subset rectangle. The method returns left, top, right, and bottom integers that form an inclusive bounding rectangle around the text element.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →