How to Handle Vertical Text and Complex Layouts in Tesseract OCR
Tesseract OCR handles vertical text and complex layouts by internally rotating vertical blocks 90° counter-clockwise during layout analysis, detecting text orientation via gradient projections in textlineprojection.cpp, and using vector-based algorithms in tabfind.cpp to segment tables and columns.
Tesseract OCR, the open-source optical character recognition engine maintained by tesseract-ocr/tesseract, provides robust support for scripts written vertically (such as Japanese, Chinese, and Mongolian) and sophisticated page analysis for complex document structures. Understanding how to leverage these capabilities requires knowledge of the PageSegMode enum, internal rotation mechanisms, and layout analysis pipeline implemented across the src/textord/ module.
Detecting Vertical Text Orientation
Tesseract identifies vertical text blocks through a combination of gradient analysis and explicit block typing. In include/tesseract/publictypes.h, the PolyBlockType enum defines PT_VERTICAL_TEXT to mark blobs belonging to vertically-oriented blocks. The layout engine evaluates orientation using horizontal and vertical gradient projections in src/textord/textlineprojection.cpp and src/textord/textlineprojection.h.
When processing a page, the projection code computes both horizontal and vertical gradients. A negative gradient score indicates a vertical line, allowing the same algorithmic path to handle rotated text without requiring separate logic branches. This detection occurs automatically when the detect_vertical_text variable is enabled (default: true), though you can force vertical handling explicitly for specific use cases.
Forcing Vertical Block Processing with PageSegMode
For documents containing purely vertical text, Tesseract exposes PSM_SINGLE_BLOCK_VERT_TEXT through the TessBaseAPI::SetPageSegMode method defined in src/ccmain/tesseractclass.h. When this mode is active, the engine performs specific rotational transformations documented in src/textord/textord.cpp (lines 200–217).
The process works as follows:
- Each
TO_BLOCKis wrapped in aPOLY_BLOCKof typePT_VERTICAL_TEXT - The block is rotated 90° counter-clockwise using
rotate(anticlockwise90) - Standard layout analysis (
make_rows,BaselineDetect,make_words) executes on the rotated image - Re-rotation fields (
set_re_rotation,set_classify_rotation) restore original geometry for classification
#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>
int main() {
tesseract::TessBaseAPI api;
if (api.Init(nullptr, "jpn")) return 1;
// Force single vertical block processing
api.SetPageSegMode(tesseract::PSM_SINGLE_BLOCK_VERT_TEXT);
Pix *pix = pixRead("vertical_page.png");
api.SetImage(pix);
char *out = api.GetUTF8Text();
printf("%s\n", out);
api.End();
pixDestroy(&pix);
return 0;
}
Analyzing Complex Layouts and Tables
Tesseract handles complex page structures—such as tables, multi-column documents, and mixed-orientation pages—through the TabFind class in src/textord/tabfind.cpp and TabVector management in src/textord/tabvector.cpp. The algorithm discovers near-vertical tab-stop vectors, merges similar vectors using TabVector::MergeSimilarTabVectors, and uses these to split pages into rows and columns.
The system automatically adapts to both horizontal and vertical tables because vector orientation derives from the actual data rather than preconceived layout assumptions. The textord_tabvector_vertical_gap_fraction parameter controls how aggressively vertical gaps are interpreted as table separators.
// Preserve table structure during OCR
api.SetPageSegMode(tesseract::PSM_SPARSE);
api.SetVariable("textord_tabvector_vertical_gap_fraction", "0.5");
For mixed-orientation documents (e.g., horizontal body text with vertical captions), the pipeline in src/textord/textord.cpp extracts connected components via find_components and filter_blobs, then groups them into TO_BLOCK structures. Each block maintains its own rotation flag, allowing simultaneous processing of differently-oriented regions without manual intervention.
Handling Vertical Underlines and Baselines
Vertical scripts require specialized underline detection implemented in src/textord/underlin.cpp. The vertical_cunderline_projection function projects underline outlines vertically to establish baselines for vertical writing systems, ensuring proper character alignment during the textline formation phase.
Generating Synthetic Training Data
When training custom models for vertical scripts, src/training/text2image.cpp supports vertical text rendering through the render.set_vertical_text(true) method. Running text2image with --writing_mode vertical produces training images rotated 90° with corresponding .box files containing correctly transformed coordinates.
The src/training/pango/boxchar.cpp module includes MostlyVertical logic to analyze line orientation during ground-truth generation, inserting appropriate line breaks and spaces for vertical text flow.
// Example from text2image.cpp
bool vertical = (FLAGS_writing_mode == "vertical");
render.set_vertical_text(vertical);
Key Configuration Variables
Tesseract exposes several variables to tune vertical text and layout handling:
detect_vertical_text: Boolean (default true) enabling automatic vertical gradient analysistextord_tabvector_vertical_gap_fraction: Float controlling table column detection sensitivity- PageSegMode options:
PSM_AUTOfor automatic detection,PSM_SINGLE_BLOCK_VERT_TEXTfor forced vertical processing
Summary
- Tesseract detects vertical text using gradient projections in
textlineprojection.cppand marks blocks withPT_VERTICAL_TEXTinpublictypes.h - Force vertical processing by setting
PSM_SINGLE_BLOCK_VERT_TEXT, which triggers 90° rotation logic intextord.cpplines 200–217 - Complex layouts are parsed using
TabFindandTabVectorclasses that detect column and table structures via vertical gap analysis - Training data for vertical scripts is generated using
text2image.cppwith theset_vertical_textflag - Mixed-orientation pages are supported through per-block rotation flags set during the
TO_BLOCKcreation phase
Frequently Asked Questions
How do I enable automatic vertical text detection in Tesseract OCR?
Automatic vertical text detection is enabled by default via the detect_vertical_text variable. When using PSM_AUTO, the engine evaluates vertical gradients in src/textord/textlineprojection.cpp and automatically rotates blocks that meet the vertical threshold. No additional API calls are required unless you need to force specific behavior.
What is the difference between PSM_AUTO and PSM_SINGLE_BLOCK_VERT_TEXT?
PSM_AUTO analyzes the entire page and detects orientation per-block using gradient analysis, while PSM_SINGLE_BLOCK_VERT_TEXT treats the entire input as a single vertical block, forcing a 90° counter-clockwise rotation in src/textord/textord.cpp before processing. Use the latter for pure vertical documents like traditional Japanese manuscripts.
How does Tesseract handle tables with both horizontal and vertical text?
Tesseract uses the TabFind class in src/textord/tabfind.cpp to detect near-vertical tab vectors that define column boundaries. Each TO_BLOCK maintains independent rotation state, allowing the engine to process horizontal and vertical regions within the same table. Adjust textord_tabvector_vertical_gap_fraction to tune detection sensitivity for complex grid layouts.
Can I train Tesseract on custom vertical fonts?
Yes. Use src/training/text2image.cpp with the --writing_mode vertical flag, which sets render.set_vertical_text(true). This generates rotated training images and corresponding box files via src/training/pango/boxchar.cpp, which detects "mostly vertical" lines to ensure proper coordinate mapping during ground-truth generation.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →