deep-dive

Page Segmentation Modes (PSM) in Tesseract OCR: The Complete Guide

March 2, 2026 tesseract-ocr/tesseract ↗

Tesseract OCR provides 14 distinct Page Segmentation Modes (PSM) defined in include/tesseract/publictypes.h that control how the engine analyzes document layout, ranging from full automatic page detection to single character recognition.

The tesseract-ocr/tesseract repository uses these modes to determine which layout analysis algorithms execute before character recognition. Selecting the correct PSM is critical for accuracy when processing documents with specific structures like single lines, sparse text, or circular labels.

What Are Page Segmentation Modes?

Page Segmentation Modes tell Tesseract how to divide an input image into text blocks before performing OCR. The PageSegMode enum in include/tesseract/publictypes.h (lines 158-177) defines these behaviors as integer constants that activate different combinations of column detection, line finding, and orientation analysis.

Each mode acts as a switch that enables or disables specific sub-algorithms through helper predicates like PSM_OSD_ENABLED, PSM_ORIENTATION_ENABLED, and PSM_BLOCK_FIND_ENABLED defined later in the same header (lines 186-206).

The 14 Tesseract PSM Values Explained

The following modes control layout analysis from broad automatic detection to specialized single-element recognition:

PSM_OSD_ONLY (0) – Perform only orientation and script detection without OCR.
PSM_AUTO_OSD (1) – Automatic page segmentation with orientation and script detection enabled.
PSM_AUTO_ONLY (2) – Automatic page segmentation without OSD and without OCR (layout analysis only).
PSM_AUTO (3) – Fully automatic page segmentation without orientation/script detection.
PSM_SINGLE_COLUMN (4) – Assume a single column of text with variable font sizes.
PSM_SINGLE_BLOCK_VERT_TEXT (5) – Assume a single uniform block of vertically-oriented text.
PSM_SINGLE_BLOCK (6) – Assume a single uniform block of text (default mode).
PSM_SINGLE_LINE (7) – Treat the image as a single text line.
PSM_SINGLE_WORD (8) – Treat the image as a single word.
PSM_CIRCLE_WORD (9) – Treat the image as a single word inside a circle.
PSM_SINGLE_CHAR (10) – Treat the image as a single character.
PSM_SPARSE_TEXT (11) – Find sparse text without orientation/script detection.
PSM_SPARSE_TEXT_OSD (12) – Find sparse text with orientation and script detection enabled.
PSM_RAW_LINE (13) – Treat the image as a raw line, bypassing most layout analysis.
PSM_COUNT (14) – Internal constant representing the number of enum entries.

How PSM Controls Internal Layout Algorithms

According to the source code in include/tesseract/publictypes.h, Tesseract uses inline helper functions to check which capabilities a specific mode requires. Components like src/textord/strokewidth.cpp query these predicates to decide whether to compute stroke width or skip layout analysis entirely.

PSM_OSD_ENABLED returns true for modes that require orientation and script detection.
PSM_ORIENTATION_ENABLED activates rotation analysis.
PSM_BLOCK_FIND_ENABLED controls whether the page is divided into text blocks.

When you call SetPageSegMode() in src/api/baseapi.cpp, the API stores your selection and subsequent recognition steps query these predicates to determine which analysis modules to execute.

Setting Page Segmentation Modes in Code

You can configure PSM through both the C++ and C APIs.

C++ API Example

To set a specific mode using the C++ interface:

#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>

tesseract::TessBaseAPI api;
api.Init(nullptr, "eng");               // Load English language data
api.SetPageSegMode(tesseract::PSM_SINGLE_LINE); // Choose single line mode
Pix* image = pixRead("line.png");
api.SetImage(image);
char* text = api.GetUTF8Text();
printf("%s", text);
api.End();
pixDestroy(&image);

C API Example

The C wrapper provides equivalent functionality:

#include <tesseract/capi.h>

TessBaseAPI* api = TessBaseAPICreate();
TessBaseAPIInit3(api, NULL, "eng");
TessBaseAPISetPageSegMode(api, PSM_AUTO);   // Automatic segmentation
TessBaseAPISetImageFile(api, "page.png", 0);
char* out = TessBaseAPIGetUTF8Text(api);
printf("%s", out);
TessBaseAPIDelete(api);

Running Orientation and Script Detection Only

To extract page orientation without performing full OCR:

api.SetPageSegMode(tesseract::PSM_OSD_ONLY);
api.Recognize(nullptr);
int orientation, confidence, script;
api.DetectOrientationScript(&orientation, &confidence, &script);

Summary

Page Segmentation Modes in tesseract-ocr/tesseract are defined in include/tesseract/publictypes.h as the PageSegMode enum.
14 operational modes exist (0-13), plus PSM_COUNT for internal bounds checking.
Default behavior uses PSM_SINGLE_BLOCK (6), assuming one uniform text block.
Helper predicates like PSM_OSD_ENABLED control which sub-algorithms execute during layout analysis.
API methods SetPageSegMode() (C++) and TessBaseAPISetPageSegMode() (C) configure the mode before image processing.

Frequently Asked Questions

What is the default Page Segmentation Mode in Tesseract?

The default mode is PSM_SINGLE_BLOCK (6), which assumes the image contains a single uniform block of text. This provides the best balance for general document scanning without requiring specific layout assumptions.

Which PSM should I use for single line text images?

Use PSM_SINGLE_LINE (7) when processing images containing exactly one line of text, such as license plates or street signs. This mode skips multi-column analysis and treats the entire input as a continuous text line.

How do I detect page orientation without performing OCR?

Set the mode to PSM_OSD_ONLY (0) and call DetectOrientationScript() after Recognize(). This performs only orientation and script detection, returning rotation angles and confidence scores without extracting text content.

What is the difference between PSM_AUTO and PSM_AUTO_OSD?

PSM_AUTO (3) performs automatic page segmentation without orientation detection, while PSM_AUTO_OSD (1) includes both layout analysis and orientation/script detection. Use PSM_AUTO_OSD when processing scanned documents that may be rotated 90, 180, or 270 degrees.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how tesseract-ocr/tesseract works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →