architecture

How ONNX Runtime Handles Dynamic Shapes During Inference: Architecture and API Guide

April 24, 2026 microsoft/onnxruntime ↗

ONNX Runtime resolves dynamic (symbolic) dimensions at runtime through a three-layer system involving symbolic shape representation, user-configurable session option overrides, and provider-specific shape propagation, enabling inference on models with variable batch sizes and sequence lengths.

ONNX Runtime (ORT) treats dynamic dimensions as first-class entities that persist from model loading through inference. This capability allows the microsoft/onnxruntime engine to execute models exported with -1 or named symbolic dimensions (e.g., "batch" or "seq_len") without static recompilation. The implementation spans from the C++ TensorShape class in core/framework/tensor_type_and_shape.cc to execution-provider-specific optimizations in TensorRT and WebNN backends.

Symbolic Shape Representation in TensorShape

When parsing an ONNX model, ORT encounters dimensions marked with -1 or explicit symbolic names. The runtime stores these as symbolic dimensions within the TensorShape class, maintaining both the placeholder value and optional human-readable identifiers.

In core/framework/tensor_type_and_shape.cc (lines 106-119), the implementation captures:

Denotation: The raw -1 placeholder representing an unknown dimension size
Symbolic names: User-friendly identifiers like "batch" or "seq_len" stored in dim_params

This dual representation allows the shape inference engine to propagate symbolic information through the graph even when concrete sizes remain unknown. Operators such as Reshape and MatMul read these symbolic dimensions to compute output shapes without requiring static allocation during model load.

Runtime Override via SessionOptions

Before session initialization, users bind concrete values to symbolic dimensions using the SessionOptions API. This mechanism bridges the gap between symbolic model definitions and the static requirements of certain hardware backends.

The key methods are:

AddFreeDimensionOverride: Binds a concrete size to a dimension denotation (the -1 placeholder)
AddFreeDimensionOverrideByName: Binds a concrete size to a named symbolic dimension (e.g., "batch")

These overrides are stored internally in OrtSessionOptions::free_dimension_overrides, implemented in core/session/abi_session_options.cc (lines 30-42). The C API exposes these through OrtAddFreeDimensionOverride and OrtAddFreeDimensionOverrideByName, registered in core/session/onnxruntime_c_api.cc at line 4337.

Shape Propagation and Provider Handling

During graph initialization, ORT executes symbolic shape inference using the same engine employed for static shapes. The process resolves dimensions through the following flow:

Graph partition: The inference engine identifies operators with dynamic inputs
Override application: When a dimension has a registered override, TensorShape::IsDynamic() returns false and the concrete size propagates through downstream nodes
Memory allocation: Execution providers allocate buffers once actual dimensions become known, either from overrides or runtime input tensors

Execution providers handle dynamic shapes differently based on backend capabilities:

CPU: Full dynamic shape support using the generic shape inference engine
CUDA/TensorRT: Creates dynamic input profiles on-the-fly when overrides are missing, falling back to optimization profiles for unknown ranges (see core/providers/tensorrt/tensorrt_execution_provider.cc, lines 3202-3220)
WebNN: Does not support dynamic shapes; requires explicit overrides via sessionOptions.freeDimensionOverrides to avoid runtime errors (see core/providers/webnn/builders/helper.cc, lines 87-90)
OpenVINO: Materializes symbolic shapes into static dimensions using overrides before compilation (core/providers/openvino/backend_manager.cc)

Setting Dynamic Shape Overrides: Code Examples

Python API

Use add_free_dimension_override_by_name to bind symbolic names before creating the InferenceSession:

import onnxruntime as ort
import numpy as np

# Configure session options with concrete dimension values

options = ort.SessionOptions()
options.add_free_dimension_override_by_name("batch", 2)
options.add_free_dimension_override_by_name("seq_len", 5)

# Load model with symbolic dimensions [-1, -1, 256]

sess = ort.InferenceSession("model_with_dynamic.onnx", sess_options=options)

# Create input matching the overridden shapes

input_data = np.random.randn(2, 5, 256).astype(np.float32)
outputs = sess.run(None, {"input": input_data})
print("Output shape:", outputs[0].shape)

The Python wrapper calls the C-API implementation at python/onnxruntime.capi.cc (line 215), forwarding to AddFreeDimensionOverrideByName in abi_session_options.cc.

C++ API

For low-level integration, use the OrtSessionOptions directly:

#include "onnxruntime_c_api.h"
#include <iostream>
#include <vector>

int main() {
  Ort::Env env{ORT_LOGGING_LEVEL_WARNING, "test"};
  Ort::SessionOptions opts;
  
  // Bind symbolic "seq_len" to concrete size 8
  Ort::ThrowOnError(OrtAddFreeDimensionOverrideByName(opts, "seq_len", 8));
  
  // Load model with shape [-1, seq_len, 128]
  Ort::Session session{env, "model_dynamic.onnx", opts};
  
  // Prepare concrete tensor (batch=1, seq_len=8, features=128)
  std::vector<int64_t> dims = {1, 8, 128};
  std::vector<float> data(1 * 8 * 128, 1.0f);
  
  Ort::MemoryInfo mem_info = Ort::MemoryInfo::CreateCpu(
      OrtArenaAllocator, OrtMemTypeDefault);
  Ort::Value input_tensor = Ort::Value::CreateTensor<float>(
      mem_info, data.data(), data.size(), dims.data(), dims.size());
  
  const char* input_names[] = {"input"};
  const char* output_names[] = {"output"};
  auto outputs = session.Run(Ort::RunOptions{nullptr}, 
      input_names, &input_tensor, 1, output_names, 1);
}

Inspecting Symbolic Dimensions

Retrieve symbolic dimension names at runtime to verify model capabilities:

info = sess.get_inputs()[0].type_and_shape
print("Symbolic dimensions:", info.get_symbolic_dimensions())  # ['batch', 'seq_len']

This calls the implementation in tensor_type_and_shape.cc (lines 106-119), which populates symbolic names from the dim_params stored in the ONNX type information.

Summary

Symbolic representation: ONNX Runtime stores dynamic dimensions using -1 placeholders and optional names in TensorShape, parsed from model metadata in tensor_type_and_shape.cc.
User overrides: The AddFreeDimensionOverrideByName API in abi_session_options.cc allows binding concrete values to symbolic names before session creation, stored in free_dimension_overrides.
Provider flexibility: CPU and TensorRT providers handle truly dynamic shapes through lazy allocation and optimization profiles, while WebNN requires static overrides to prevent runtime failures.
API consistency: Both Python and C++ interfaces ultimately call the C-API functions registered in onnxruntime_c_api.cc, ensuring uniform behavior across language bindings.

Frequently Asked Questions

What is the difference between `AddFreeDimensionOverride` and `AddFreeDimensionOverrideByName`?

AddFreeDimensionOverride targets dimension denotations (the raw -1 values without specific identifiers), while AddFreeDimensionOverrideByName targets symbolic names (human-readable strings like "batch" or "seq_len" embedded by the model exporter). Use the latter when your ONNX model contains named dimensions; use the former when working with unnamed dynamic dimensions or when the specific axis position is known but not labeled.

Can I run inference without providing dimension overrides?

Yes, provided you use an execution provider that supports dynamic shapes, such as the CPU or TensorRT providers. These allocate memory lazily once the concrete input dimensions arrive at inference time. However, providers like WebNN require overrides because they compile static graphs and cannot handle runtime dimension variability. Without overrides on incompatible providers, ONNX Runtime raises an ORT_INVALID_ARGUMENT error during session creation.

Which execution providers support fully dynamic shapes?

The CPU execution provider offers full dynamic shape support, handling arbitrary dimension changes between inference calls. TensorRT and CUDA support dynamic shapes but may require optimization profiles for performance optimization when dimensions vary. OpenVINO materializes symbolic shapes before compilation, requiring overrides for truly dynamic behavior. WebNN does not support dynamic shapes and mandates the use of AddFreeDimensionOverrideByName to create a static execution plan.

How does ONNX Runtime handle shape mismatches at runtime?

ONNX Runtime validates input tensor shapes against the computed graph dimensions at each inference call. If an input shape contradicts a previously established override (e.g., providing a batch size of 8 when "batch" was overridden to 4), the runtime raises ORT_INVALID_ARGUMENT. For providers supporting dynamic shapes, providing different concrete dimensions across calls triggers re-inference of shapes on-the-fly, though this may incur overhead as providers reallocate buffers or rebuild optimization profiles.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how microsoft/onnxruntime works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →