Page Segmentation Modes (PSM) in Tesseract OCR: The Complete Guide
Tesseract OCR provides 14 distinct Page Segmentation Modes (PSM) defined in include/tesseract/publictypes.h that control how the engine analyzes document layout, ranging from full automatic page detection to single character recognition.
The tesseract-ocr/tesseract repository uses these modes to determine which layout analysis algorithms execute before character recognition. Selecting the correct PSM is critical for accuracy when processing documents with specific structures like single lines, sparse text, or circular labels.
What Are Page Segmentation Modes?
Page Segmentation Modes tell Tesseract how to divide an input image into text blocks before performing OCR. The PageSegMode enum in include/tesseract/publictypes.h (lines 158-177) defines these behaviors as integer constants that activate different combinations of column detection, line finding, and orientation analysis.
Each mode acts as a switch that enables or disables specific sub-algorithms through helper predicates like PSM_OSD_ENABLED, PSM_ORIENTATION_ENABLED, and PSM_BLOCK_FIND_ENABLED defined later in the same header (lines 186-206).
The 14 Tesseract PSM Values Explained
The following modes control layout analysis from broad automatic detection to specialized single-element recognition:
- PSM_OSD_ONLY (0) – Perform only orientation and script detection without OCR.
- PSM_AUTO_OSD (1) – Automatic page segmentation with orientation and script detection enabled.
- PSM_AUTO_ONLY (2) – Automatic page segmentation without OSD and without OCR (layout analysis only).
- PSM_AUTO (3) – Fully automatic page segmentation without orientation/script detection.
- PSM_SINGLE_COLUMN (4) – Assume a single column of text with variable font sizes.
- PSM_SINGLE_BLOCK_VERT_TEXT (5) – Assume a single uniform block of vertically-oriented text.
- PSM_SINGLE_BLOCK (6) – Assume a single uniform block of text (default mode).
- PSM_SINGLE_LINE (7) – Treat the image as a single text line.
- PSM_SINGLE_WORD (8) – Treat the image as a single word.
- PSM_CIRCLE_WORD (9) – Treat the image as a single word inside a circle.
- PSM_SINGLE_CHAR (10) – Treat the image as a single character.
- PSM_SPARSE_TEXT (11) – Find sparse text without orientation/script detection.
- PSM_SPARSE_TEXT_OSD (12) – Find sparse text with orientation and script detection enabled.
- PSM_RAW_LINE (13) – Treat the image as a raw line, bypassing most layout analysis.
- PSM_COUNT (14) – Internal constant representing the number of enum entries.
How PSM Controls Internal Layout Algorithms
According to the source code in include/tesseract/publictypes.h, Tesseract uses inline helper functions to check which capabilities a specific mode requires. Components like src/textord/strokewidth.cpp query these predicates to decide whether to compute stroke width or skip layout analysis entirely.
PSM_OSD_ENABLEDreturns true for modes that require orientation and script detection.PSM_ORIENTATION_ENABLEDactivates rotation analysis.PSM_BLOCK_FIND_ENABLEDcontrols whether the page is divided into text blocks.
When you call SetPageSegMode() in src/api/baseapi.cpp, the API stores your selection and subsequent recognition steps query these predicates to determine which analysis modules to execute.
Setting Page Segmentation Modes in Code
You can configure PSM through both the C++ and C APIs.
C++ API Example
To set a specific mode using the C++ interface:
#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>
tesseract::TessBaseAPI api;
api.Init(nullptr, "eng"); // Load English language data
api.SetPageSegMode(tesseract::PSM_SINGLE_LINE); // Choose single line mode
Pix* image = pixRead("line.png");
api.SetImage(image);
char* text = api.GetUTF8Text();
printf("%s", text);
api.End();
pixDestroy(&image);
C API Example
The C wrapper provides equivalent functionality:
#include <tesseract/capi.h>
TessBaseAPI* api = TessBaseAPICreate();
TessBaseAPIInit3(api, NULL, "eng");
TessBaseAPISetPageSegMode(api, PSM_AUTO); // Automatic segmentation
TessBaseAPISetImageFile(api, "page.png", 0);
char* out = TessBaseAPIGetUTF8Text(api);
printf("%s", out);
TessBaseAPIDelete(api);
Running Orientation and Script Detection Only
To extract page orientation without performing full OCR:
api.SetPageSegMode(tesseract::PSM_OSD_ONLY);
api.Recognize(nullptr);
int orientation, confidence, script;
api.DetectOrientationScript(&orientation, &confidence, &script);
Summary
- Page Segmentation Modes in
tesseract-ocr/tesseractare defined ininclude/tesseract/publictypes.has thePageSegModeenum. - 14 operational modes exist (0-13), plus
PSM_COUNTfor internal bounds checking. - Default behavior uses
PSM_SINGLE_BLOCK (6), assuming one uniform text block. - Helper predicates like
PSM_OSD_ENABLEDcontrol which sub-algorithms execute during layout analysis. - API methods
SetPageSegMode()(C++) andTessBaseAPISetPageSegMode()(C) configure the mode before image processing.
Frequently Asked Questions
What is the default Page Segmentation Mode in Tesseract?
The default mode is PSM_SINGLE_BLOCK (6), which assumes the image contains a single uniform block of text. This provides the best balance for general document scanning without requiring specific layout assumptions.
Which PSM should I use for single line text images?
Use PSM_SINGLE_LINE (7) when processing images containing exactly one line of text, such as license plates or street signs. This mode skips multi-column analysis and treats the entire input as a continuous text line.
How do I detect page orientation without performing OCR?
Set the mode to PSM_OSD_ONLY (0) and call DetectOrientationScript() after Recognize(). This performs only orientation and script detection, returning rotation angles and confidence scores without extracting text content.
What is the difference between PSM_AUTO and PSM_AUTO_OSD?
PSM_AUTO (3) performs automatic page segmentation without orientation detection, while PSM_AUTO_OSD (1) includes both layout analysis and orientation/script detection. Use PSM_AUTO_OSD when processing scanned documents that may be rotated 90, 180, or 270 degrees.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →