How to Handle Vertical Text and Complex Layouts in Tesseract OCR

Tesseract OCR handles vertical text and complex layouts by internally rotating vertical blocks 90° counter-clockwise during layout analysis, detecting text orientation via gradient projections in textlineprojection.cpp, and using vector-based algorithms in tabfind.cpp to segment tables and columns.

Tesseract OCR, the open-source optical character recognition engine maintained by tesseract-ocr/tesseract, provides robust support for scripts written vertically (such as Japanese, Chinese, and Mongolian) and sophisticated page analysis for complex document structures. Understanding how to leverage these capabilities requires knowledge of the PageSegMode enum, internal rotation mechanisms, and layout analysis pipeline implemented across the src/textord/ module.

Detecting Vertical Text Orientation

Tesseract identifies vertical text blocks through a combination of gradient analysis and explicit block typing. In include/tesseract/publictypes.h, the PolyBlockType enum defines PT_VERTICAL_TEXT to mark blobs belonging to vertically-oriented blocks. The layout engine evaluates orientation using horizontal and vertical gradient projections in src/textord/textlineprojection.cpp and src/textord/textlineprojection.h.

When processing a page, the projection code computes both horizontal and vertical gradients. A negative gradient score indicates a vertical line, allowing the same algorithmic path to handle rotated text without requiring separate logic branches. This detection occurs automatically when the detect_vertical_text variable is enabled (default: true), though you can force vertical handling explicitly for specific use cases.

Forcing Vertical Block Processing with PageSegMode

For documents containing purely vertical text, Tesseract exposes PSM_SINGLE_BLOCK_VERT_TEXT through the TessBaseAPI::SetPageSegMode method defined in src/ccmain/tesseractclass.h. When this mode is active, the engine performs specific rotational transformations documented in src/textord/textord.cpp (lines 200–217).

The process works as follows:

  1. Each TO_BLOCK is wrapped in a POLY_BLOCK of type PT_VERTICAL_TEXT
  2. The block is rotated 90° counter-clockwise using rotate(anticlockwise90)
  3. Standard layout analysis (make_rows, BaselineDetect, make_words) executes on the rotated image
  4. Re-rotation fields (set_re_rotation, set_classify_rotation) restore original geometry for classification
#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>

int main() {
  tesseract::TessBaseAPI api;
  if (api.Init(nullptr, "jpn")) return 1;

  // Force single vertical block processing
  api.SetPageSegMode(tesseract::PSM_SINGLE_BLOCK_VERT_TEXT);
  
  Pix *pix = pixRead("vertical_page.png");
  api.SetImage(pix);
  
  char *out = api.GetUTF8Text();
  printf("%s\n", out);
  api.End();
  pixDestroy(&pix);
  return 0;
}

Analyzing Complex Layouts and Tables

Tesseract handles complex page structures—such as tables, multi-column documents, and mixed-orientation pages—through the TabFind class in src/textord/tabfind.cpp and TabVector management in src/textord/tabvector.cpp. The algorithm discovers near-vertical tab-stop vectors, merges similar vectors using TabVector::MergeSimilarTabVectors, and uses these to split pages into rows and columns.

The system automatically adapts to both horizontal and vertical tables because vector orientation derives from the actual data rather than preconceived layout assumptions. The textord_tabvector_vertical_gap_fraction parameter controls how aggressively vertical gaps are interpreted as table separators.

// Preserve table structure during OCR
api.SetPageSegMode(tesseract::PSM_SPARSE);
api.SetVariable("textord_tabvector_vertical_gap_fraction", "0.5");

For mixed-orientation documents (e.g., horizontal body text with vertical captions), the pipeline in src/textord/textord.cpp extracts connected components via find_components and filter_blobs, then groups them into TO_BLOCK structures. Each block maintains its own rotation flag, allowing simultaneous processing of differently-oriented regions without manual intervention.

Handling Vertical Underlines and Baselines

Vertical scripts require specialized underline detection implemented in src/textord/underlin.cpp. The vertical_cunderline_projection function projects underline outlines vertically to establish baselines for vertical writing systems, ensuring proper character alignment during the textline formation phase.

Generating Synthetic Training Data

When training custom models for vertical scripts, src/training/text2image.cpp supports vertical text rendering through the render.set_vertical_text(true) method. Running text2image with --writing_mode vertical produces training images rotated 90° with corresponding .box files containing correctly transformed coordinates.

The src/training/pango/boxchar.cpp module includes MostlyVertical logic to analyze line orientation during ground-truth generation, inserting appropriate line breaks and spaces for vertical text flow.

// Example from text2image.cpp
bool vertical = (FLAGS_writing_mode == "vertical");
render.set_vertical_text(vertical);

Key Configuration Variables

Tesseract exposes several variables to tune vertical text and layout handling:

  • detect_vertical_text: Boolean (default true) enabling automatic vertical gradient analysis
  • textord_tabvector_vertical_gap_fraction: Float controlling table column detection sensitivity
  • PageSegMode options: PSM_AUTO for automatic detection, PSM_SINGLE_BLOCK_VERT_TEXT for forced vertical processing

Summary

  • Tesseract detects vertical text using gradient projections in textlineprojection.cpp and marks blocks with PT_VERTICAL_TEXT in publictypes.h
  • Force vertical processing by setting PSM_SINGLE_BLOCK_VERT_TEXT, which triggers 90° rotation logic in textord.cpp lines 200–217
  • Complex layouts are parsed using TabFind and TabVector classes that detect column and table structures via vertical gap analysis
  • Training data for vertical scripts is generated using text2image.cpp with the set_vertical_text flag
  • Mixed-orientation pages are supported through per-block rotation flags set during the TO_BLOCK creation phase

Frequently Asked Questions

How do I enable automatic vertical text detection in Tesseract OCR?

Automatic vertical text detection is enabled by default via the detect_vertical_text variable. When using PSM_AUTO, the engine evaluates vertical gradients in src/textord/textlineprojection.cpp and automatically rotates blocks that meet the vertical threshold. No additional API calls are required unless you need to force specific behavior.

What is the difference between PSM_AUTO and PSM_SINGLE_BLOCK_VERT_TEXT?

PSM_AUTO analyzes the entire page and detects orientation per-block using gradient analysis, while PSM_SINGLE_BLOCK_VERT_TEXT treats the entire input as a single vertical block, forcing a 90° counter-clockwise rotation in src/textord/textord.cpp before processing. Use the latter for pure vertical documents like traditional Japanese manuscripts.

How does Tesseract handle tables with both horizontal and vertical text?

Tesseract uses the TabFind class in src/textord/tabfind.cpp to detect near-vertical tab vectors that define column boundaries. Each TO_BLOCK maintains independent rotation state, allowing the engine to process horizontal and vertical regions within the same table. Adjust textord_tabvector_vertical_gap_fraction to tune detection sensitivity for complex grid layouts.

Can I train Tesseract on custom vertical fonts?

Yes. Use src/training/text2image.cpp with the --writing_mode vertical flag, which sets render.set_vertical_text(true). This generates rotated training images and corresponding box files via src/training/pango/boxchar.cpp, which detects "mostly vertical" lines to ensure proper coordinate mapping during ground-truth generation.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →