How Memory Planning Works in ONNX Runtime’s OptimizerExecutionFrame
OptimizerExecutionFrame inherits from IExecutionFrame to reuse the standard memory planning infrastructure, automatically generating and caching MemoryPatternGroups during the first execution of an optimization sub-graph to eliminate allocation overhead in subsequent passes.
In the microsoft/onnxruntime repository, graph optimizers such as constant folding and node fusion execute sub-graphs using a specialized frame that participates in the same memory planning lifecycle as standard inference. This design ensures that memory planning in ONNX Runtime’s OptimizerExecutionFrame benefits from pre-allocated buffers and pattern caching without requiring a separate allocation strategy.
Architecture of OptimizerExecutionFrame
The optimizer frame does not implement a dedicated memory planner. Instead, it inherits from IExecutionFrame and delegates all allocation decisions to the existing execution framework that powers the SequentialExecutor driving the optimization pass.
Inheritance from IExecutionFrame
OptimizerExecutionFrame extends IExecutionFrame located in onnxruntime/core/framework/execution_frame.h. This inheritance grants the optimizer access to GetOrCreateNodeOutputMLValue and AllocateAsPerAllocationPlan, methods that consult the cached allocation plan produced by the planner. The frame stores no independent planner instance; the planner_ resides in the underlying ExecutionFrame used by the sequential executor.
Initialization with Empty Feeds
When an optimizer constructs its execution frame, it calls IExecutionFrame::Init with empty input feeds because optimizers operate on static initializer data rather than runtime inputs. The constructor in onnxruntime/core/optimizer/optimizer_execution_frame.cc (lines 73-80) sets up the index mappings and immediately delegates initialization:
OptimizerExecutionFrame::OptimizerExecutionFrame(
const Info& info,
const std::vector<int>& fetch_mlvalue_idxs,
const std::vector<OrtValue>& fetches)
: IExecutionFrame(info.GetMLValueNameIdxMap(),
info.GetNodeIndexInfo(),
fetch_mlvalue_idxs),
info_(info) {
Init(gsl::span<const int>(),
gsl::span<const OrtValue>(),
info.GetInitializers(),
info.GetSparseInitializerLookupFunc(),
fetches);
}
Memory Pattern Lifecycle
The memory planning workflow follows a generate-then-cache pattern. The first execution of a sub-graph records allocation requirements, builds a reusable pattern, and stores it in SessionState for future optimizer passes.
First-Run Pattern Generation
When no cached pattern exists for the current input shapes, ExecutionFrame::GeneratePatterns creates an OrtValuePatternPlanner instance to trace the graph. This planner executes once, records every tensor allocation, and constructs a MemoryPatternGroup. The implementation in onnxruntime/core/framework/execution_frame.cc (lines 935-940) delegates to the planner:
Status ExecutionFrame::GeneratePatterns(MemoryPatternGroup& out) {
return planner_->GeneratePatterns(out);
}
SessionState Caching Strategy
After generation, SessionState caches the MemoryPatternGroup using a hash of input tensor shapes produced by CalculateMemoryPatternsKey. Subsequent executions—including repeated optimizer passes on identical sub-graph shapes—retrieve the pattern via SessionState::GetMemoryPatternGroup. The fetch logic in onnxruntime/core/framework/execution_frame.cc (lines 986-999) retrieves the cached group before allocation:
mem_patterns_ = session_state.GetMemoryPatternGroup(feeds,
feed_mlvalue_idxs,
inferred_shapes_);
Buffer Reuse and Pre-allocation
When a cached pattern is available, ExecutionFrame allocates one large contiguous buffer per device using Allocator::Alloc (or AllocOnStream for stream-aware allocators) and stores it in the buffers_ member. Individual tensor allocations during kernel execution call MemoryPattern::GetBlock to obtain offsets within this pre-allocated chunk, avoiding individual malloc calls and improving cache locality.
Tensor Allocation During Optimization
When a kernel running inside the optimizer frame requests an output tensor, the frame invokes GetOrCreateNodeOutputMLValue. This method checks the allocation plan stored in SessionState. If a valid memory pattern exists, the allocation is satisfied from the pre-allocated buffer group rather than requesting new memory from the device allocator. Consequently, constant folding and fusion passes benefit from zero-allocation execution after the first warm-up run.
Practical Implementation Example
The following pattern demonstrates how to construct an optimizer frame and execute a sub-graph with automatic memory planning:
// 1. Build frame info for constant folding
onnxruntime::OptimizerExecutionFrame::Info info(nodes,
graph.InitializerTensors(),
graph.ModelPath(),
*cpu_ep,
[](const std::string&) { return false; },
logger);
std::vector<int> fetch_idxs = {/* indices of outputs to fetch */};
// 2. Create the optimizer frame
onnxruntime::OptimizerExecutionFrame opt_frame(info, fetch_idxs);
// 3. Execute (generates memory pattern on first run)
onnxruntime::SequentialExecutor executor;
executor.Run(opt_frame, feeds, fetches); // Allocates and caches patterns
// 4. Subsequent executions reuse cached buffers
executor.Run(opt_frame, feeds, fetches); // Zero allocation path
Key Source Files
| File | Purpose |
|---|---|
onnxruntime/core/optimizer/optimizer_execution_frame.h |
Declares OptimizerExecutionFrame inheriting from IExecutionFrame. |
onnxruntime/core/optimizer/optimizer_execution_frame.cc |
Implements constructor and kernel lookup for optimization passes. |
onnxruntime/core/framework/execution_frame.h |
Defines IExecutionFrame, allocation helpers, and the GeneratePatterns interface. |
onnxruntime/core/framework/execution_frame.cc |
Implements pattern generation, cached pattern retrieval, and buffer management. |
onnxruntime/core/framework/ort_value_pattern_planner.h |
Defines OrtValuePatternPlanner that records allocations to build MemoryPatternGroup. |
onnxruntime/core/framework/session_state.h |
Caches MemoryPatternGroup keyed by input shape hashes and exposes GetMemoryPatternGroup. |
onnxruntime/core/framework/memory_pattern.h |
Defines MemoryPattern and MemoryPatternGroup structures for offset-based allocation. |
Summary
- OptimizerExecutionFrame reuses the standard
IExecutionFrameinfrastructure rather than implementing a custom planner, ensuring memory planning consistency between optimization and inference. - MemoryPatternGroup objects are generated by
OrtValuePatternPlannerduring the first execution of a sub-graph and cached inSessionStateindexed by input shape hashes. - Pre-allocated buffers are managed per device, with individual tensors receiving offsets via
MemoryPattern::GetBlock, eliminating per-tensor allocation overhead in subsequent runs. - The allocation path through
GetOrCreateNodeOutputMLValueautomatically selects the cached pattern path when available, providing constant-folded and fused graphs with the same performance benefits as standard execution.
Frequently Asked Questions
Does OptimizerExecutionFrame implement its own memory planner?
No. According to the microsoft/onnxruntime source code, OptimizerExecutionFrame inherits from IExecutionFrame and relies on the planner_ instance owned by the underlying ExecutionFrame. It does not maintain separate planning logic; instead, it consumes the same allocation plans and memory patterns used by standard inference execution.
How are memory patterns cached between optimizer runs?
SessionState stores MemoryPatternGroup objects in an internal map keyed by the hash returned from CalculateMemoryPatternsKey. When an optimizer re-executes a sub-graph with identical input shapes, GetMemoryPatternGroup retrieves the existing pattern, allowing the ExecutionFrame to reuse the pre-allocated buffer offsets without regenerating the plan.
What triggers the generation of a new MemoryPatternGroup?
A new pattern generates automatically when the ExecutionFrame detects no cached entry for the current input shape configuration. During the first SequentialExecutor::Run invocation, GeneratePatterns executes the graph once via OrtValuePatternPlanner to record allocation requirements and construct the MemoryPatternGroup for future reuse.
Why does the optimizer frame use empty feeds during initialization?
Optimizers such as constant folding operate on static initializers and constant values already present in the graph rather than runtime input tensors. The OptimizerExecutionFrame constructor passes empty spans to IExecutionFrame::Init because the sub-graph inputs are sourced from info.GetInitializers(), not external feeds, ensuring the frame correctly maps constant tensors to kernel outputs.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →