How ONNX Runtime Handles Dynamic Shapes During Inference: Architecture and API Guide
ONNX Runtime resolves dynamic (symbolic) dimensions at runtime through a three-layer system involving symbolic shape representation, user-configurable session option overrides, and provider-specific shape propagation, enabling inference on models with variable batch sizes and sequence lengths.
ONNX Runtime (ORT) treats dynamic dimensions as first-class entities that persist from model loading through inference. This capability allows the microsoft/onnxruntime engine to execute models exported with -1 or named symbolic dimensions (e.g., "batch" or "seq_len") without static recompilation. The implementation spans from the C++ TensorShape class in core/framework/tensor_type_and_shape.cc to execution-provider-specific optimizations in TensorRT and WebNN backends.
Symbolic Shape Representation in TensorShape
When parsing an ONNX model, ORT encounters dimensions marked with -1 or explicit symbolic names. The runtime stores these as symbolic dimensions within the TensorShape class, maintaining both the placeholder value and optional human-readable identifiers.
In core/framework/tensor_type_and_shape.cc (lines 106-119), the implementation captures:
- Denotation: The raw
-1placeholder representing an unknown dimension size - Symbolic names: User-friendly identifiers like
"batch"or"seq_len"stored indim_params
This dual representation allows the shape inference engine to propagate symbolic information through the graph even when concrete sizes remain unknown. Operators such as Reshape and MatMul read these symbolic dimensions to compute output shapes without requiring static allocation during model load.
Runtime Override via SessionOptions
Before session initialization, users bind concrete values to symbolic dimensions using the SessionOptions API. This mechanism bridges the gap between symbolic model definitions and the static requirements of certain hardware backends.
The key methods are:
AddFreeDimensionOverride: Binds a concrete size to a dimension denotation (the-1placeholder)AddFreeDimensionOverrideByName: Binds a concrete size to a named symbolic dimension (e.g.,"batch")
These overrides are stored internally in OrtSessionOptions::free_dimension_overrides, implemented in core/session/abi_session_options.cc (lines 30-42). The C API exposes these through OrtAddFreeDimensionOverride and OrtAddFreeDimensionOverrideByName, registered in core/session/onnxruntime_c_api.cc at line 4337.
Shape Propagation and Provider Handling
During graph initialization, ORT executes symbolic shape inference using the same engine employed for static shapes. The process resolves dimensions through the following flow:
- Graph partition: The inference engine identifies operators with dynamic inputs
- Override application: When a dimension has a registered override,
TensorShape::IsDynamic()returnsfalseand the concrete size propagates through downstream nodes - Memory allocation: Execution providers allocate buffers once actual dimensions become known, either from overrides or runtime input tensors
Execution providers handle dynamic shapes differently based on backend capabilities:
- CPU: Full dynamic shape support using the generic shape inference engine
- CUDA/TensorRT: Creates dynamic input profiles on-the-fly when overrides are missing, falling back to optimization profiles for unknown ranges (see
core/providers/tensorrt/tensorrt_execution_provider.cc, lines 3202-3220) - WebNN: Does not support dynamic shapes; requires explicit overrides via
sessionOptions.freeDimensionOverridesto avoid runtime errors (seecore/providers/webnn/builders/helper.cc, lines 87-90) - OpenVINO: Materializes symbolic shapes into static dimensions using overrides before compilation (
core/providers/openvino/backend_manager.cc)
Setting Dynamic Shape Overrides: Code Examples
Python API
Use add_free_dimension_override_by_name to bind symbolic names before creating the InferenceSession:
import onnxruntime as ort
import numpy as np
# Configure session options with concrete dimension values
options = ort.SessionOptions()
options.add_free_dimension_override_by_name("batch", 2)
options.add_free_dimension_override_by_name("seq_len", 5)
# Load model with symbolic dimensions [-1, -1, 256]
sess = ort.InferenceSession("model_with_dynamic.onnx", sess_options=options)
# Create input matching the overridden shapes
input_data = np.random.randn(2, 5, 256).astype(np.float32)
outputs = sess.run(None, {"input": input_data})
print("Output shape:", outputs[0].shape)
The Python wrapper calls the C-API implementation at python/onnxruntime.capi.cc (line 215), forwarding to AddFreeDimensionOverrideByName in abi_session_options.cc.
C++ API
For low-level integration, use the OrtSessionOptions directly:
#include "onnxruntime_c_api.h"
#include <iostream>
#include <vector>
int main() {
Ort::Env env{ORT_LOGGING_LEVEL_WARNING, "test"};
Ort::SessionOptions opts;
// Bind symbolic "seq_len" to concrete size 8
Ort::ThrowOnError(OrtAddFreeDimensionOverrideByName(opts, "seq_len", 8));
// Load model with shape [-1, seq_len, 128]
Ort::Session session{env, "model_dynamic.onnx", opts};
// Prepare concrete tensor (batch=1, seq_len=8, features=128)
std::vector<int64_t> dims = {1, 8, 128};
std::vector<float> data(1 * 8 * 128, 1.0f);
Ort::MemoryInfo mem_info = Ort::MemoryInfo::CreateCpu(
OrtArenaAllocator, OrtMemTypeDefault);
Ort::Value input_tensor = Ort::Value::CreateTensor<float>(
mem_info, data.data(), data.size(), dims.data(), dims.size());
const char* input_names[] = {"input"};
const char* output_names[] = {"output"};
auto outputs = session.Run(Ort::RunOptions{nullptr},
input_names, &input_tensor, 1, output_names, 1);
}
Inspecting Symbolic Dimensions
Retrieve symbolic dimension names at runtime to verify model capabilities:
info = sess.get_inputs()[0].type_and_shape
print("Symbolic dimensions:", info.get_symbolic_dimensions()) # ['batch', 'seq_len']
This calls the implementation in tensor_type_and_shape.cc (lines 106-119), which populates symbolic names from the dim_params stored in the ONNX type information.
Summary
- Symbolic representation: ONNX Runtime stores dynamic dimensions using
-1placeholders and optional names inTensorShape, parsed from model metadata intensor_type_and_shape.cc. - User overrides: The
AddFreeDimensionOverrideByNameAPI inabi_session_options.ccallows binding concrete values to symbolic names before session creation, stored infree_dimension_overrides. - Provider flexibility: CPU and TensorRT providers handle truly dynamic shapes through lazy allocation and optimization profiles, while WebNN requires static overrides to prevent runtime failures.
- API consistency: Both Python and C++ interfaces ultimately call the C-API functions registered in
onnxruntime_c_api.cc, ensuring uniform behavior across language bindings.
Frequently Asked Questions
What is the difference between AddFreeDimensionOverride and AddFreeDimensionOverrideByName?
AddFreeDimensionOverride targets dimension denotations (the raw -1 values without specific identifiers), while AddFreeDimensionOverrideByName targets symbolic names (human-readable strings like "batch" or "seq_len" embedded by the model exporter). Use the latter when your ONNX model contains named dimensions; use the former when working with unnamed dynamic dimensions or when the specific axis position is known but not labeled.
Can I run inference without providing dimension overrides?
Yes, provided you use an execution provider that supports dynamic shapes, such as the CPU or TensorRT providers. These allocate memory lazily once the concrete input dimensions arrive at inference time. However, providers like WebNN require overrides because they compile static graphs and cannot handle runtime dimension variability. Without overrides on incompatible providers, ONNX Runtime raises an ORT_INVALID_ARGUMENT error during session creation.
Which execution providers support fully dynamic shapes?
The CPU execution provider offers full dynamic shape support, handling arbitrary dimension changes between inference calls. TensorRT and CUDA support dynamic shapes but may require optimization profiles for performance optimization when dimensions vary. OpenVINO materializes symbolic shapes before compilation, requiring overrides for truly dynamic behavior. WebNN does not support dynamic shapes and mandates the use of AddFreeDimensionOverrideByName to create a static execution plan.
How does ONNX Runtime handle shape mismatches at runtime?
ONNX Runtime validates input tensor shapes against the computed graph dimensions at each inference call. If an input shape contradicts a previously established override (e.g., providing a batch size of 8 when "batch" was overridden to 4), the runtime raises ORT_INVALID_ARGUMENT. For providers supporting dynamic shapes, providing different concrete dimensions across calls triggers re-inference of shapes on-the-fly, though this may incur overhead as providers reallocate buffers or rebuild optimization profiles.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →