How to Use Tesseract's Sparse Text Mode for Unstructured Documents

Tesseract's sparse text mode (PSM 11) treats images as collections of isolated text fragments, bypassing traditional page layout analysis to extract text from unstructured documents like receipts and business cards.

When processing documents that lack clear column structures or reading order—such as forms, invoices, or photographed signs—standard page segmentation often fails. The Tesseract OCR engine provides a specialized sparse text mode designed specifically for these scenarios. This mode is implemented in the tesseract-ocr/tesseract repository through the PageSegMode enumeration and activated via the PSM_SPARSE_TEXT constant.

What Is Tesseract's Sparse Text Mode?

Sparse text mode is a page segmentation mode that instructs Tesseract to find text anywhere on the image without attempting to organize it into blocks, lines, or reading order. According to the source code in include/tesseract/publictypes.h, the mode is defined as PSM_SPARSE_TEXT with the numeric value 11, while PSM_SPARSE_TEXT_OSD (value 12) adds automatic orientation and script detection.

The architectural components that enable this functionality include:

Component Role in Sparse Text Mode Source Location
PageSegMode enum Defines PSM_SPARSE_TEXT (11) and PSM_SPARSE_TEXT_OSD (12) include/tesseract/publictypes.h (lines 71-74)
PSM_SPARSE helper Inline function that checks if a mode is sparse variant; used to bypass block/line finding include/tesseract/publictypes.h (lines 95-97)
CLI argument mapping Translates --psm 11 to PSM_SPARSE_TEXT src/tesseract.cpp (lines 58-61)
SetPageSegMode Public API method that stores the mode as an IntParam for the layout engine include/tesseract/baseapi.h (lines 56-61)

How Sparse Text Mode Works Internally

When PSM_SPARSE_TEXT is active, Tesseract's layout analysis pipeline undergoes significant simplification. The PSM_SPARSE inline function (defined in publictypes.h) returns true for modes 11 and 12, triggering conditional logic throughout src/textord/* and src/ccmain/* that disables heavyweight page analysis.

Specifically, the engine:

  1. Skips column and paragraph detection — The PSM_BLOCK_FIND_ENABLED macro returns false, preventing the block-finding algorithms from executing.
  2. Enables minimal blob detection — Instead of organizing text into reading order, the sparse text finder scans the entire image for character blobs without structural grouping.
  3. Optionally runs OSD — If using PSM_SPARSE_TEXT_OSD (12), the orientation and script detection module analyzes the detected text to determine rotation angles and language scripts.

This architecture makes sparse text mode ideal for unstructured inputs such as receipts, business cards, forms with scattered fields, or any image where text does not follow a conventional reading order.

Using Sparse Text Mode: Code Examples

Command-Line Usage

Activate sparse text mode via the --psm flag followed by 11 for basic sparse text or 12 to include orientation and script detection:


# Basic sparse text mode for receipts or unstructured documents

tesseract receipt.png output --psm 11 -l eng

# Sparse text with automatic orientation and script detection

tesseract business_card.png output --psm 12 -l eng

The mapping from 11 to PSM_SPARSE_TEXT occurs in src/tesseract.cpp, where the integer argument is converted to the corresponding PageSegMode enum value.

C++ API Implementation

When integrating Tesseract directly into applications, use the SetPageSegMode method exposed in include/tesseract/baseapi.h:

#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>

int main() {
  tesseract::TessBaseAPI api;
  
  // Initialize with English language data
  if (api.Init(nullptr, "eng")) {
    fprintf(stderr, "Could not initialize tesseract.\n");
    return 1;
  }

  // Set sparse text mode (PSM_SPARSE_TEXT = 11)
  api.SetPageSegMode(tesseract::PSM_SPARSE_TEXT);
  
  // Alternative: Enable orientation and script detection
  // api.SetPageSegMode(tesseract::PSM_SPARSE_TEXT_OSD);

  // Load image using Leptonica
  Pix *image = pixRead("receipt.png");
  api.SetImage(image);

  // Extract text
  char *outText = api.GetUTF8Text();
  printf("OCR Output:\n%s\n", outText);

  // Cleanup
  delete[] outText;
  pixDestroy(&image);
  api.End();
  return 0;
}

The SetPageSegMode method stores the mode as an IntParam, making it available to the internal layout engine in src/textord/ and src/ccmain/.

Python with Pytesseract

The Python wrapper forwards the --psm configuration directly to the Tesseract CLI:

import pytesseract
from PIL import Image

# Open unstructured document (receipt, business card, etc.)

img = Image.open('receipt.png')

# Configure for sparse text mode

custom_config = r'--psm 11'

# Perform OCR

text = pytesseract.image_to_string(
    img, 
    config=custom_config, 
    lang='eng'
)

print(text)

For orientation detection alongside sparse text, use --psm 12 in the configuration string.

Key Source Files and Implementation Details

Understanding the implementation helps debug segmentation issues and optimize OCR pipelines for unstructured data:

File Purpose
include/tesseract/publictypes.h Defines PageSegMode enum including PSM_SPARSE_TEXT (11) and PSM_SPARSE_TEXT_OSD (12); contains PSM_SPARSE inline helper function (lines 95-97) that identifies sparse modes
src/tesseract.cpp Maps command-line --psm arguments to enum values; handles the translation of 11 to sparse text mode
include/tesseract/baseapi.h Declares SetPageSegMode() method (lines 56-61) used to programmatically configure segmentation
src/ccmain/tesseractclass.cpp Contains help text displaying available page segmentation modes
src/textord/* (e.g., tablefind.cpp, colpartitiongrid.cpp) Layout analysis modules that check PSM_SPARSE to bypass block finding and column detection when sparse mode is active

The PSM_SPARSE inline function serves as the gatekeeper throughout the codebase. When it returns true, the engine skips expensive layout analysis stages and proceeds directly to character blob detection.

Summary

  • Sparse text mode (PSM_SPARSE_TEXT, value 11) treats images as collections of isolated text fragments rather than structured pages, making it ideal for receipts, business cards, and forms.
  • The mode is defined in include/tesseract/publictypes.h and activated via the --psm 11 CLI flag or api.SetPageSegMode(tesseract::PSM_SPARSE_TEXT) in C++.
  • When active, the PSM_SPARSE helper function triggers bypass logic in src/textord/* that disables block/line finding and runs minimal blob detection instead.
  • Use PSM_SPARSE_TEXT_OSD (value 12) to add automatic orientation and script detection for rotated or multilingual unstructured documents.

Frequently Asked Questions

What is the difference between PSM 11 and PSM 12?

PSM 11 (SPARSE_TEXT) performs basic sparse text detection without orientation analysis, while PSM 12 (SPARSE_TEXT_OSD) adds Orientation and Script Detection (OSD). According to include/tesseract/publictypes.h, PSM 12 runs the sparse text finder then analyzes the detected text to determine page rotation angles and script types, making it suitable for documents that may be rotated or contain multiple languages.

When should I use sparse text mode instead of automatic page segmentation?

Use sparse text mode when processing unstructured documents where text appears as isolated fragments rather than flowing paragraphs. The PSM_SPARSE check in the layout engine (found in src/textord/) disables column and paragraph detection, making the mode ideal for receipts, business cards, screenshots with scattered UI text, or photographed forms with fields in non-linear arrangements. For standard books or articles with clear reading order, use automatic segmentation (PSM 3 or 6) instead.

Does sparse text mode affect OCR accuracy?

Sparse text mode does not degrade character recognition accuracy but changes how text is organized in the output. Because the mode bypasses block and line finding (as controlled by the PSM_SPARSE inline function in publictypes.h), Tesseract treats each text blob independently. This eliminates reading-order errors that occur when the engine incorrectly groups scattered text into artificial paragraphs, but it also means the output may not reflect the spatial relationships between text elements unless you use bounding box data (via GetBoundingBox methods).

How do I enable sparse text mode in Python?

In Python, use the pytesseract wrapper to pass the --psm 11 configuration flag:

import pytesseract
from PIL import Image

img = Image.open('document.png')
config = r'--psm 11'
text = pytesseract.image_to_string(img, config=config)

The wrapper forwards this argument to the Tesseract CLI, which maps 11 to PSM_SPARSE_TEXT in src/tesseract.cpp. For C++ API users, call api.SetPageSegMode(tesseract::PSM_SPARSE_TEXT) as defined in include/tesseract/baseapi.h.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →