PolyBlock Types in Tesseract: A Complete Guide to Layout Analysis

PolyBlock Types are an enumeration that classifies detected page regions—such as text columns, headings, images, tables, and equations—allowing Tesseract to perform layout analysis and apply region-specific processing during OCR.

The tesseract-ocr/tesseract engine represents each contiguous region of a document as a POLY_BLOCK with a specific PolyBlockType. This type system, defined in include/tesseract/publictypes.h, drives the layout analysis pipeline by categorizing content so that text extraction, image isolation, or table detection can be handled appropriately.

Understanding the PolyBlockType Enumeration

At the core of Tesseract’s layout analysis is the PolyBlockType enum declared in include/tesseract/publictypes.h. This enumeration assigns a logical kind to every block detected during page segmentation.

// include/tesseract/publictypes.h
enum PolyBlockType {
  PT_UNKNOWN,          // not yet classified
  PT_FLOWING_TEXT,     // normal column‑wise text
  PT_HEADING_TEXT,     // text spanning multiple columns
  PT_PULLOUT_TEXT,     // cross‑column pull‑out text
  PT_EQUATION,         // block belonging to an equation region
  PT_INLINE_EQUATION,  // inline equation inside text
  PT_TABLE,            // table region
  PT_VERTICAL_TEXT,    // vertically‑oriented text lines
  PT_CAPTION_TEXT,     // text belonging to an image
  PT_FLOWING_IMAGE,    // image inside a column
  PT_HEADING_IMAGE,    // image spanning columns
  PT_PULLOUT_IMAGE,    // pull‑out image
  PT_HORZ_LINE,        // horizontal line
  PT_VERT_LINE,        // vertical line
  PT_NOISE,            // stray marks outside any column
  PT_COUNT
};

Rather than comparing raw integers, use the inline predicates also defined in publictypes.h to test block categories:

  • PTIsTextType() – Returns true for flowing text, headings, pull-outs, captions, vertical text, and equations.
  • PTIsImageType() – Identifies flowing, heading, and pull-out images.
  • PTIsLineType() – Matches horizontal and vertical lines.
  • PTIsPulloutType() – Detects pull-out text or images spanning multiple columns.

How Tesseract Determines Block Types During Layout Analysis

During page segmentation, the engine groups connected components into ColPartition objects (see src/ccstruct/colpartition.h and src/ccstruct/colpartition.cpp). Each partition examines geometric properties—such as column spanning, height, width, and neighbor relationships—to determine its content category.

After partitioning, each ColPartition instantiates a POLY_BLOCK (defined in src/ccstruct/polyblk.h and implemented in src/ccstruct/polyblk.cpp). This object stores the block’s bounding polygon and the assigned PolyBlockType. The type assignment drives downstream processing: for example, OCR is executed only on text-bearing types like PT_FLOWING_TEXT, PT_HEADING_TEXT, and PT_TABLE, while image blocks can be extracted for separate handling.

For visual debugging, POLY_BLOCK::ColorForPolyBlockType() in polyblk.cpp maps each enum value to a specific color, ensuring that layout visualization tools display flowing text in green, headings in blue, tables in cyan, and other categories in distinct colors.

Accessing PolyBlock Types via the C++ API

The C++ API exposes block types through PageIterator::BlockType(), implemented in src/ccmain/pageiterator.cpp. The following example demonstrates how to enumerate blocks and handle each type appropriately:

#include <tesseract/baseapi.h>
#include <tesseract/publictypes.h>
#include <tesseract/pageiterator.h>

int main() {
  tesseract::TessBaseAPI api;
  api.Init(nullptr, "eng");                // initialise with English language
  api.SetImage("sample.png");               // your input image
  api.Recognize(nullptr);                  // run layout + OCR

  // Obtain a PageIterator that walks the layout hierarchy.
  tesseract::PageIterator* it = api.AnalyseLayout();
  if (!it) return 1;

  do {
    // Retrieve the block type for the current block.
    tesseract::PolyBlockType blk = it->BlockType();

    // Simple handling based on block category.
    if (tesseract::PTIsTextType(blk)) {
      // Text block – extract the text.
      const char* txt = it->GetUTF8Text(tesseract::RIL_BLOCK);
      printf("TEXT (%d): %s\n", blk, txt);
      delete[] txt;
    } else if (tesseract::PTIsImageType(blk)) {
      // Image block – you could save the image region.
      printf("IMAGE block (type %d)\n", blk);
    } else if (tesseract::PTIsLineType(blk)) {
      printf("LINE block (type %d)\n", blk);
    } else if (blk == tesseract::PT_TABLE) {
      printf("TABLE block detected\n");
    } else {
      printf("OTHER block (type %d)\n", blk);
    }
  } while (it->Next(tesseract::RIL_BLOCK));   // advance to next block

  api.End();                                 // clean up
  return 0;
}

This pattern leverages the predicate helpers to avoid verbose switch statements when filtering for specific layout elements.

Using PolyBlock Types with the C API

The legacy C API provides equivalent functionality through TessPageIteratorBlockType, declared in include/tesseract/capi.h. This wrapper returns the same enumeration values for use in C-based applications:

#include <tesseract/capi.h>

int main() {
  TessBaseAPI* api = TessBaseAPICreate();
  TessBaseAPIInit3(api, NULL, "eng");
  TessBaseAPISetImage2(api, PixRead("sample.png"));
  TessBaseAPIRecognize(api, NULL);

  TessPageIterator* it = TessBaseAPIGetIterator(api);
  if (!it) return 1;

  do {
    TessPolyBlockType blk = TessPageIteratorBlockType(it);
    if (blk == PT_FLOWING_TEXT || blk == PT_HEADING_TEXT) {
      char* txt = TessResultIteratorGetUTF8Text(it, RIL_BLOCK);
      printf("TEXT (%d): %s\n", blk, txt);
      TessDeleteText(txt);
    } else if (blk == PT_FLOWING_IMAGE) {
      printf("IMAGE block (%d)\n", blk);
    }
  } while (TessPageIteratorNext(it, RIL_BLOCK));

  TessBaseAPIEnd(api);
  TessBaseAPIDelete(api);
  return 0;
}

Both APIs provide access to the same underlying layout data structures, allowing you to build custom pipelines that process tables, ignore noise regions, or extract images based on their classified types.

Visualizing Block Types for Debugging

Tesseract can generate debug images that color-code blocks according to their PolyBlockType. The mapping occurs in POLY_BLOCK::ColorForPolyBlockType() in polyblk.cpp, which returns a ScrollView::Color for each enum value:

// src/ccstruct/polyblk.cpp – colour mapping
ScrollView::Color POLY_BLOCK::ColorForPolyBlockType(PolyBlockType type) {
  static const ScrollView::Color kPBColors[] = {
    ScrollView::MAGENTA,    // PT_UNKNOWN
    ScrollView::GREEN,      // PT_FLOWING_TEXT
    ScrollView::BLUE,       // PT_HEADING_TEXT
    // … (kept in sync with enum order)
  };
  return kPBColors[static_cast<int>(type)];
}

Enable visual debugging via command line:

tesseract sample.png out -c debug_file=debug.tif

The output image displays flowing text in green, headings in blue, tables in distinct colors, and noise in magenta, allowing you to verify that Tesseract’s layout analysis correctly identified document regions.

Summary

  • PolyBlockType (defined in include/tesseract/publictypes.h) categorizes every detected region during Tesseract’s layout analysis, including text, images, tables, equations, and lines.
  • The ColPartition class determines block types based on geometry and content, storing results in POLY_BLOCK objects.
  • Use predicate helpers like PTIsTextType() and PTIsImageType() to filter blocks efficiently without manual enum comparisons.
  • Access block types programmatically via PageIterator::BlockType() (C++) or TessPageIteratorBlockType() (C API).
  • Visual debugging maps each type to a specific color via POLY_BLOCK::ColorForPolyBlockType(), aiding in layout verification.

Frequently Asked Questions

What is the difference between PT_FLOWING_TEXT and PT_HEADING_TEXT?

PT_FLOWING_TEXT represents standard text constrained to a single column, while PT_HEADING_TEXT indicates text that spans multiple columns, such as section headers or titles. The ColPartition logic in colpartition.cpp distinguishes these based on horizontal span relative to detected column boundaries.

How do I programmatically check if a block contains text or images?

Use the inline predicates defined in publictypes.h. Call PTIsTextType(blk) to match all text variants (flowing, heading, pull-out, caption, vertical, and equations), or PTIsImageType(blk) to identify image regions. These functions return boolean values without requiring you to enumerate every enum constant manually.

Can I customize how Tesseract assigns PolyBlock Types?

The type assignment is hardcoded in the ColPartition and POLY_BLOCK logic within the Tesseract source. While you cannot override the classifier through the public API without modifying the source, you can post-process the iterator results and reclassify blocks based on custom heuristics after AnalyseLayout() or Recognize() completes.

Which block types does Tesseract actually run OCR on?

According to the source implementation, Tesseract performs OCR primarily on PT_FLOWING_TEXT, PT_HEADING_TEXT, PT_PULLOUT_TEXT, PT_TABLE, PT_VERTICAL_TEXT, and equation types. Image-only blocks (PT_FLOWING_IMAGE, etc.) and line blocks (PT_HORZ_LINE, PT_VERT_LINE) are excluded from text recognition and can be handled separately for image extraction or line detection tasks.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →