How to Use Orientation and Script Detection (OSD) in Tesseract OCR

Orientation and Script Detection (OSD) is a lightweight preprocessing stage in the Tesseract OCR engine that determines the rotation angle (0°, 90°, 180°, or 270°) and dominant writing system (e.g., Latin, Cyrillic, Arabic) of text in an image without performing full character recognition.

OSD runs before the main OCR pipeline in the tesseract-ocr/tesseract repository, making it fast and ideal for auto-rotating scanned documents or routing multi-language content. Unlike full text recognition, OSD only requires the osd.traineddata model to analyze blob statistics and page geometry, operating independently of neural network recognition.

How OSD Works in Tesseract

The OSD pipeline relies on three core components defined in include/tesseract/osdetect.h: OrientationDetector, which evaluates the rotation of each connected component; ScriptDetector, which scores possible writing systems for the chosen orientation; and OSResults, which aggregates scores across four possible orientations.

The algorithm accumulates evidence in the OSResults structure, storing orientation scores in orientations[4] and per-script scores in scripts_na[4][kMaxNumberOfScripts]. After processing all blobs, OSResults::update_best_orientation() selects the rotation with highest confidence, while OSResults::get_best_script() identifies the dominant script for that orientation. According to the source in src/api/baseapi.cpp, this logic executes within the DetectOS pipeline, which is automatically invoked when specific page segmentation modes are enabled.

Page Segmentation Modes for OSD

OSD is controlled through Page Segmentation Modes (PSM), defined in include/tesseract/publictypes.h:

enum PageSegMode {
  PSM_OSD_ONLY        = 0,   // Orientation & script detection only
  PSM_AUTO_OSD        = 1,   // Automatic layout + OSD
  // … other modes …
};

The engine determines whether to run OSD via the PSM_OSD_ENABLED macro (also in publictypes.h), which returns true when pageseg_mode <= PSM_AUTO_OSD:

inline bool PSM_OSD_ENABLED(int pageseg_mode) {
  return pageseg_mode <= PSM_AUTO_OSD || pageseg_mode == PSM_SPARSE_TEXT_OSD;
}

Use PSM_OSD_ONLY (value 0) when you need only rotation and script data without text recognition. Use PSM_AUTO_OSD (value 1) to combine automatic page layout analysis with OSD before performing OCR.

Detecting Orientation and Script via the C++ API

The high-level entry point is TessBaseAPI::DetectOrientationScript, implemented in src/api/baseapi.cpp. This method populates an OSResults instance, translates the orientation index (0-3) to degrees, and looks up the human-readable script name via the UNICHARSET:

bool TessBaseAPI::DetectOrientationScript(int *orient_deg,
                                          float *orient_conf,
                                          const char **script_name,
                                          float *script_conf);

For convenience, TessBaseAPI::GetOsdText formats these results as a plain-text report. Here is a complete example:

#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>
#include <iostream>

int main() {
  tesseract::TessBaseAPI api;
  
  // Initialize with default datapath; eng.traineddata typically includes osd data
  if (api.Init(nullptr, "eng")) {
    std::cerr << "Could not initialize tesseract.\n";
    return 1;
  }

  // Request OSD only mode
  api.SetPageSegMode(tesseract::PSM_OSD_ONLY);

  Pix *image = pixRead("sample.png");
  api.SetImage(image);

  // Retrieve formatted OSD report
  char *osd = api.GetOsdText(0);
  if (osd) {
    std::cout << osd << "\n";
    delete[] osd;
  } else {
    std::cerr << "OSD detection failed.\n";
  }

  pixDestroy(&image);
  api.End();
  return 0;
}

Alternatively, access raw values directly:

int degrees;
float orient_conf, script_conf;
const char *script_name;

if (api.DetectOrientationScript(&degrees, &orient_conf, 
                                &script_name, &script_conf)) {
  std::cout << "Rotate " << degrees << " degrees, script: " 
            << script_name << "\n";
}

Using the C API

The C API wrappers in include/tesseract/capi.h expose equivalent functionality through TessBaseAPISetPageSegMode and TessBaseAPIGetOsdText:

#include <tesseract/capi.h>
#include <stdio.h>

int main() {
    TessBaseAPI *api = TessBaseAPICreate();
    if (TessBaseAPIInit3(api, NULL, "eng")) {
        fprintf(stderr, "Init failed\n");
        return 1;
    }

    // Switch to OSD only mode
    TessBaseAPISetPageSegMode(api, PSM_OSD_ONLY);
    TessBaseAPISetImageFile(api, "sample.png", NULL);
    
    char *osd = TessBaseAPIGetOsdText(api, 0);
    if (osd) {
        printf("%s\n", osd);
        TessDeleteText(osd);
    } else {
        fprintf(stderr, "OSD detection failed\n");
    }

    TessBaseAPIEnd(api);
    TessBaseAPIDelete(api);
    return 0;
}

For direct value access without string formatting, use TessBaseAPIDetectOrientationScript, which mirrors the C++ method signature.

Command-Line OSD Usage

Invoke OSD directly from the command line using the --psm flag, which maps to the PageSegMode enum values:


# OSD only (orientation & script detection without OCR)

tesseract sample.png stdout --psm 0

# Automatic page segmentation with OSD

tesseract sample.png stdout --psm 1

The output for --psm 0 resembles:


Page number: 0
Orientation in degrees: 90
Rotate: 1
Orientation confidence: 12.34
Script: Latin
Script confidence: 13.56

Rotation values indicate clockwise degrees required to make text upright: 0 (no rotation), 90, 180, or 270.

Key Implementation Files

Understanding the OSD architecture requires referencing these specific files in the tesseract-ocr/tesseract repository:

  • include/tesseract/publictypes.h — Defines PageSegMode enum, PSM_OSD_ONLY, PSM_AUTO_OSD, and PSM_OSD_ENABLED logic.
  • include/tesseract/osdetect.h — Declares OrientationDetector, ScriptDetector, and OSResults classes that perform the statistical analysis.
  • src/api/baseapi.cpp — Implements DetectOS, DetectOrientationScript, and GetOsdText, connecting the low-level detectors to the public API (approximately lines 1540–1580 and 1640–1680).
  • include/tesseract/capi.h — Exposes C-compatible functions including TessBaseAPISetPageSegMode and TessBaseAPIDetectOrientationScript.
  • unittest/osd_test.cc — Contains regression tests verifying OSD accuracy across multiple scripts and rotation angles.

Note that OSD operates on the legacy OCR engine (as implemented in baseapi.cpp around line 1560), not the LSTM neural network.

Summary

  • OSD determines text rotation (0°/90°/180°/270°) and writing system without performing character recognition.
  • Enable via PSM_OSD_ONLY (mode 0) for detection only, or PSM_AUTO_OSD (mode 1) to combine with layout analysis.
  • The C++ API uses SetPageSegMode followed by GetOsdText or DetectOrientationScript.
  • The C API provides TessBaseAPISetPageSegMode and TessBaseAPIGetOsdText in capi.h.
  • OSD requires only osd.traineddata, not full language models, and runs on the legacy engine.

Frequently Asked Questions

What is the difference between PSM_OSD_ONLY and PSM_AUTO_OSD?

PSM_OSD_ONLY (value 0) runs exclusively the orientation and script detection pipeline, returning only rotation and script metadata without recognizing text. PSM_AUTO_OSD (value 1) performs automatic page layout analysis and OSD, then continues with full OCR on the detected text regions. Both modes trigger the DetectOS pipeline in baseapi.cpp.

Does OSD require a specific trained data file?

Yes. While you must initialize Tesseract with a language (e.g., "eng" or "osd"), the osd.traineddata file contains the statistical models for script identification and rotation detection. Most language packs include this data, but standalone OSD usage requires at least the OSD model available in your TESSDATA_PREFIX directory.

Which Tesseract engine does OSD use?

OSD runs on the legacy OCR engine as implemented in src/api/baseapi.cpp, not the LSTM neural network. The detection relies on blob-based statistical analysis rather than deep learning recognition, making it lightweight but dependent on the legacy engine's page layout analysis.

How do I interpret the confidence scores from OSD?

The orientation confidence (orient_conf) and script confidence (script_conf) values represent relative statistical certainty derived from the scoring accumulators in OSResults. Higher values indicate stronger evidence, but these scores are not probabilities (0–1); they represent internal confidence metrics used to select the best candidate from the four possible orientations and available scripts.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →