# How to Configure Graph Optimization Levels in ONNX Runtime SessionOptions

> Learn how to configure graph optimization levels in ONNX Runtime SessionOptions. Control transformation aggressiveness from 0 to 99 in Python or C++ for faster inference.

- Repository: [Microsoft/onnxruntime](https://github.com/microsoft/onnxruntime)
- Tags: how-to-guide
- Published: 2026-04-24

---

**Set the `graph_optimization_level` attribute in Python or call `SetGraphOptimizationLevel()` in C++ on your `SessionOptions` object before creating the InferenceSession, choosing from levels 0 (DisableAll) to 99 (EnableAll) to control transformation aggressiveness.**

ONNX Runtime applies a series of graph transformations—such as constant folding, node fusion, and layout optimizations—before executing a model. In the `microsoft/onnxruntime` repository, you configure these transformations by setting the graph optimization level in `SessionOptions`, which determines how aggressively the runtime rewrites the computation graph during model loading.

## Understanding Graph Optimization Levels

The runtime defines five distinct optimization levels in [`include/onnxruntime/core/session/onnxruntime_c_api.h`](https://github.com/microsoft/onnxruntime/blob/main/include/onnxruntime/core/session/onnxruntime_c_api.h) (lines 448–454). Each level enables increasingly aggressive graph passes:

- **ORT_DISABLE_ALL (0)**: No optimizations are applied; the graph executes exactly as described in the ONNX model.
- **ORT_ENABLE_BASIC (1)**: Core optimizations including constant folding and simple node fusion.
- **ORT_ENABLE_EXTENDED (2)**: Advanced fusions (e.g., Conv-Add-Mul combinations) and additional algebraic simplifications.
- **ORT_ENABLE_LAYOUT (3)**: Layout transformations (e.g., NCHW ↔ NHWC) to match the preferred data layout of the execution provider.
- **ORT_ENABLE_ALL (99)**: All available optimizations including provider-specific kernel fusions (e.g., CUDA-specific optimizations).

When you instantiate a session, the runtime applies the selected transformations during model loading based on the level specified in your `SessionOptions` configuration.

## Configuring Session Options in Python

In the Python API, the `SessionOptions` class exposes a mutable attribute `graph_optimization_level` that forwards directly to the underlying C API function `OrtApi::SetSessionGraphOptimizationLevel`.

```python
import onnxruntime as ort

# Create a SessionOptions instance

opts = ort.SessionOptions()

# Set to disable all optimizations (useful for debugging)

opts.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL

# Pass options when creating the session

session = ort.InferenceSession("model.onnx", sess_options=opts)

```

This pattern appears in the test suite at [`onnxruntime/test/python/transformers/test_data/gpt2_pytorch1.5_opset11/generate_tiny_gpt2_model.py`](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/python/transformers/test_data/gpt2_pytorch1.5_opset11/generate_tiny_gpt2_model.py) (line 460), demonstrating how to configure options before model instantiation.

## Configuring Session Options in C++

The C++ API wraps the C enumeration through `Ort::SessionOptions::SetGraphOptimizationLevel`, defined in [`include/onnxruntime/core/session/onnxruntime_cxx_api.h`](https://github.com/microsoft/onnxruntime/blob/main/include/onnxruntime/core/session/onnxruntime_cxx_api.h) (lines 81–84). This method forwards your selection to `OrtApi::SetSessionGraphOptimizationLevel`.

```cpp
#include <onnxruntime_cxx_api.h>

int main() {
    Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "example");
    Ort::SessionOptions opts;

    // Enable extended optimizations
    opts.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_EXTENDED);

    // Load the model with configured options
    Ort::Session session(env, "model.onnx", opts);
    // ... run inference ...
}

```

The sample at `samples/cxx/main.cc` (lines 42–44) provides a complete end-to-end example of setting the optimization level before session construction.

## Configuring Session Options in C#

The .NET binding exposes the configuration through the `GraphOptimizationLevel` property on the `SessionOptions` class, forwarding to the native implementation via [`SessionOptions.shared.cs`](https://github.com/microsoft/onnxruntime/blob/main/SessionOptions.shared.cs) (line 907).

```csharp
using Microsoft.ML.OnnxRuntime;

var options = new SessionOptions();
options.GraphOptimizationLevel = GraphOptimizationLevel.ORT_ENABLE_LAYOUT;

using var session = new InferenceSession("model.onnx", options);

```

## Selecting the Right Optimization Level

Choose your configuration based on specific operational requirements:

- **Debugging**: Use **ORT_DISABLE_ALL** (0) to execute the raw ONNX graph and isolate whether inference issues stem from graph transformations.
- **Maximum Performance**: Use **ORT_ENABLE_ALL** (99) to leverage all available optimizations, though this increases model load time.
- **Hardware-Specific Tuning**: Use **ORT_ENABLE_LAYOUT** (3) when working with execution providers like TensorRT that require specific input layouts (NHWC vs NCHW).

## Summary

- The `GraphOptimizationLevel` enum in [`onnxruntime_c_api.h`](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime_c_api.h) defines five levels from 0 (no optimization) to 99 (maximum optimization).
- Configure the level via `SessionOptions` before creating the session; changes cannot be applied to an existing session.
- In Python, set `SessionOptions.graph_optimization_level`; in C++, call `SetGraphOptimizationLevel()`; in C#, set the `GraphOptimizationLevel` property.
- Higher optimization levels reduce runtime latency but increase model loading time, while `ORT_DISABLE_ALL` preserves the original graph structure for debugging.

## Frequently Asked Questions

### What is the default graph optimization level in ONNX Runtime?

Most production builds default to **ORT_ENABLE_ALL** (level 99), enabling all available optimizations including provider-specific kernel fusions. However, the exact default may vary depending on the specific build configuration and execution provider version.

### Can I change the optimization level after creating the InferenceSession?

No. Graph optimizations are applied during session construction when the model is first loaded. You must configure the optimization level in the `SessionOptions` object before passing it to the `InferenceSession` or `Ort::Session` constructor.

### How do graph optimization levels affect model loading versus inference time?

Higher optimization levels (particularly **ORT_ENABLE_EXTENDED** and **ORT_ENABLE_ALL**) increase model loading time because the runtime must analyze and rewrite the graph. However, they typically reduce inference latency by fusing operations and eliminating redundant computations. For latency-sensitive applications with long-running sessions, the trade-off favors higher optimization levels.

### Which optimization level should I use when debugging inference accuracy issues?

Use **ORT_DISABLE_ALL** (0). This executes the model exactly as defined in the original ONNX file, eliminating transformations as a source of numerical discrepancies. If the issue persists at level 0, it likely stems from the model itself or the execution provider; if it disappears, a specific graph transformation is responsible.