How to Configure Graph Optimization Levels in ONNX Runtime SessionOptions

Set the graph_optimization_level attribute in Python or call SetGraphOptimizationLevel() in C++ on your SessionOptions object before creating the InferenceSession, choosing from levels 0 (DisableAll) to 99 (EnableAll) to control transformation aggressiveness.

ONNX Runtime applies a series of graph transformations—such as constant folding, node fusion, and layout optimizations—before executing a model. In the microsoft/onnxruntime repository, you configure these transformations by setting the graph optimization level in SessionOptions, which determines how aggressively the runtime rewrites the computation graph during model loading.

Understanding Graph Optimization Levels

The runtime defines five distinct optimization levels in include/onnxruntime/core/session/onnxruntime_c_api.h (lines 448–454). Each level enables increasingly aggressive graph passes:

  • ORT_DISABLE_ALL (0): No optimizations are applied; the graph executes exactly as described in the ONNX model.
  • ORT_ENABLE_BASIC (1): Core optimizations including constant folding and simple node fusion.
  • ORT_ENABLE_EXTENDED (2): Advanced fusions (e.g., Conv-Add-Mul combinations) and additional algebraic simplifications.
  • ORT_ENABLE_LAYOUT (3): Layout transformations (e.g., NCHW ↔ NHWC) to match the preferred data layout of the execution provider.
  • ORT_ENABLE_ALL (99): All available optimizations including provider-specific kernel fusions (e.g., CUDA-specific optimizations).

When you instantiate a session, the runtime applies the selected transformations during model loading based on the level specified in your SessionOptions configuration.

Configuring Session Options in Python

In the Python API, the SessionOptions class exposes a mutable attribute graph_optimization_level that forwards directly to the underlying C API function OrtApi::SetSessionGraphOptimizationLevel.

import onnxruntime as ort

# Create a SessionOptions instance

opts = ort.SessionOptions()

# Set to disable all optimizations (useful for debugging)

opts.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL

# Pass options when creating the session

session = ort.InferenceSession("model.onnx", sess_options=opts)

This pattern appears in the test suite at onnxruntime/test/python/transformers/test_data/gpt2_pytorch1.5_opset11/generate_tiny_gpt2_model.py (line 460), demonstrating how to configure options before model instantiation.

Configuring Session Options in C++

The C++ API wraps the C enumeration through Ort::SessionOptions::SetGraphOptimizationLevel, defined in include/onnxruntime/core/session/onnxruntime_cxx_api.h (lines 81–84). This method forwards your selection to OrtApi::SetSessionGraphOptimizationLevel.

#include <onnxruntime_cxx_api.h>

int main() {
    Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "example");
    Ort::SessionOptions opts;

    // Enable extended optimizations
    opts.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_EXTENDED);

    // Load the model with configured options
    Ort::Session session(env, "model.onnx", opts);
    // ... run inference ...
}

The sample at samples/cxx/main.cc (lines 42–44) provides a complete end-to-end example of setting the optimization level before session construction.

Configuring Session Options in C#

The .NET binding exposes the configuration through the GraphOptimizationLevel property on the SessionOptions class, forwarding to the native implementation via SessionOptions.shared.cs (line 907).

using Microsoft.ML.OnnxRuntime;

var options = new SessionOptions();
options.GraphOptimizationLevel = GraphOptimizationLevel.ORT_ENABLE_LAYOUT;

using var session = new InferenceSession("model.onnx", options);

Selecting the Right Optimization Level

Choose your configuration based on specific operational requirements:

  • Debugging: Use ORT_DISABLE_ALL (0) to execute the raw ONNX graph and isolate whether inference issues stem from graph transformations.
  • Maximum Performance: Use ORT_ENABLE_ALL (99) to leverage all available optimizations, though this increases model load time.
  • Hardware-Specific Tuning: Use ORT_ENABLE_LAYOUT (3) when working with execution providers like TensorRT that require specific input layouts (NHWC vs NCHW).

Summary

  • The GraphOptimizationLevel enum in onnxruntime_c_api.h defines five levels from 0 (no optimization) to 99 (maximum optimization).
  • Configure the level via SessionOptions before creating the session; changes cannot be applied to an existing session.
  • In Python, set SessionOptions.graph_optimization_level; in C++, call SetGraphOptimizationLevel(); in C#, set the GraphOptimizationLevel property.
  • Higher optimization levels reduce runtime latency but increase model loading time, while ORT_DISABLE_ALL preserves the original graph structure for debugging.

Frequently Asked Questions

What is the default graph optimization level in ONNX Runtime?

Most production builds default to ORT_ENABLE_ALL (level 99), enabling all available optimizations including provider-specific kernel fusions. However, the exact default may vary depending on the specific build configuration and execution provider version.

Can I change the optimization level after creating the InferenceSession?

No. Graph optimizations are applied during session construction when the model is first loaded. You must configure the optimization level in the SessionOptions object before passing it to the InferenceSession or Ort::Session constructor.

How do graph optimization levels affect model loading versus inference time?

Higher optimization levels (particularly ORT_ENABLE_EXTENDED and ORT_ENABLE_ALL) increase model loading time because the runtime must analyze and rewrite the graph. However, they typically reduce inference latency by fusing operations and eliminating redundant computations. For latency-sensitive applications with long-running sessions, the trade-off favors higher optimization levels.

Which optimization level should I use when debugging inference accuracy issues?

Use ORT_DISABLE_ALL (0). This executes the model exactly as defined in the original ONNX file, eliminating transformations as a source of numerical discrepancies. If the issue persists at level 0, it likely stems from the model itself or the execution provider; if it disappears, a specific graph transformation is responsible.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →