# How Memory Planning Works in ONNX Runtime’s OptimizerExecutionFrame

> Discover how ONNX Runtime's OptimizerExecutionFrame enhances performance by automatically planning and caching memory patterns, eliminating allocation overhead for faster subsequent executions.

- Repository: [Microsoft/onnxruntime](https://github.com/microsoft/onnxruntime)
- Tags: internals
- Published: 2026-04-24

---

**OptimizerExecutionFrame inherits from IExecutionFrame to reuse the standard memory planning infrastructure, automatically generating and caching MemoryPatternGroups during the first execution of an optimization sub-graph to eliminate allocation overhead in subsequent passes.**

In the `microsoft/onnxruntime` repository, graph optimizers such as constant folding and node fusion execute sub-graphs using a specialized frame that participates in the same memory planning lifecycle as standard inference. This design ensures that **memory planning in ONNX Runtime’s OptimizerExecutionFrame** benefits from pre-allocated buffers and pattern caching without requiring a separate allocation strategy.

## Architecture of OptimizerExecutionFrame

The optimizer frame does not implement a dedicated memory planner. Instead, it inherits from `IExecutionFrame` and delegates all allocation decisions to the existing execution framework that powers the `SequentialExecutor` driving the optimization pass.

### Inheritance from IExecutionFrame

`OptimizerExecutionFrame` extends `IExecutionFrame` located in [`onnxruntime/core/framework/execution_frame.h`](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/framework/execution_frame.h). This inheritance grants the optimizer access to `GetOrCreateNodeOutputMLValue` and `AllocateAsPerAllocationPlan`, methods that consult the cached allocation plan produced by the planner. The frame stores no independent planner instance; the `planner_` resides in the underlying `ExecutionFrame` used by the sequential executor.

### Initialization with Empty Feeds

When an optimizer constructs its execution frame, it calls `IExecutionFrame::Init` with empty input feeds because optimizers operate on static initializer data rather than runtime inputs. The constructor in `onnxruntime/core/optimizer/optimizer_execution_frame.cc` (lines 73-80) sets up the index mappings and immediately delegates initialization:

```cpp
OptimizerExecutionFrame::OptimizerExecutionFrame(
    const Info& info,
    const std::vector<int>& fetch_mlvalue_idxs,
    const std::vector<OrtValue>& fetches)
    : IExecutionFrame(info.GetMLValueNameIdxMap(),
                      info.GetNodeIndexInfo(),
                      fetch_mlvalue_idxs),
      info_(info) {
  Init(gsl::span<const int>(),
       gsl::span<const OrtValue>(),
       info.GetInitializers(),
       info.GetSparseInitializerLookupFunc(),
       fetches);
}

```

## Memory Pattern Lifecycle

The memory planning workflow follows a generate-then-cache pattern. The first execution of a sub-graph records allocation requirements, builds a reusable pattern, and stores it in `SessionState` for future optimizer passes.

### First-Run Pattern Generation

When no cached pattern exists for the current input shapes, `ExecutionFrame::GeneratePatterns` creates an `OrtValuePatternPlanner` instance to trace the graph. This planner executes once, records every tensor allocation, and constructs a `MemoryPatternGroup`. The implementation in `onnxruntime/core/framework/execution_frame.cc` (lines 935-940) delegates to the planner:

```cpp
Status ExecutionFrame::GeneratePatterns(MemoryPatternGroup& out) {
  return planner_->GeneratePatterns(out);
}

```

### SessionState Caching Strategy

After generation, `SessionState` caches the `MemoryPatternGroup` using a hash of input tensor shapes produced by `CalculateMemoryPatternsKey`. Subsequent executions—including repeated optimizer passes on identical sub-graph shapes—retrieve the pattern via `SessionState::GetMemoryPatternGroup`. The fetch logic in `onnxruntime/core/framework/execution_frame.cc` (lines 986-999) retrieves the cached group before allocation:

```cpp
mem_patterns_ = session_state.GetMemoryPatternGroup(feeds,
                                                   feed_mlvalue_idxs,
                                                   inferred_shapes_);

```

### Buffer Reuse and Pre-allocation

When a cached pattern is available, `ExecutionFrame` allocates one large contiguous buffer per device using `Allocator::Alloc` (or `AllocOnStream` for stream-aware allocators) and stores it in the `buffers_` member. Individual tensor allocations during kernel execution call `MemoryPattern::GetBlock` to obtain offsets within this pre-allocated chunk, avoiding individual malloc calls and improving cache locality.

## Tensor Allocation During Optimization

When a kernel running inside the optimizer frame requests an output tensor, the frame invokes `GetOrCreateNodeOutputMLValue`. This method checks the **allocation plan** stored in `SessionState`. If a valid memory pattern exists, the allocation is satisfied from the pre-allocated buffer group rather than requesting new memory from the device allocator. Consequently, constant folding and fusion passes benefit from zero-allocation execution after the first warm-up run.

## Practical Implementation Example

The following pattern demonstrates how to construct an optimizer frame and execute a sub-graph with automatic memory planning:

```cpp
// 1. Build frame info for constant folding
onnxruntime::OptimizerExecutionFrame::Info info(nodes,
                                              graph.InitializerTensors(),
                                              graph.ModelPath(),
                                              *cpu_ep,
                                              [](const std::string&) { return false; },
                                              logger);
std::vector<int> fetch_idxs = {/* indices of outputs to fetch */};

// 2. Create the optimizer frame
onnxruntime::OptimizerExecutionFrame opt_frame(info, fetch_idxs);

// 3. Execute (generates memory pattern on first run)
onnxruntime::SequentialExecutor executor;
executor.Run(opt_frame, feeds, fetches);   // Allocates and caches patterns

// 4. Subsequent executions reuse cached buffers
executor.Run(opt_frame, feeds, fetches);   // Zero allocation path

```

## Key Source Files

| File | Purpose |
|------|---------|
| [`onnxruntime/core/optimizer/optimizer_execution_frame.h`](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/optimizer/optimizer_execution_frame.h) | Declares `OptimizerExecutionFrame` inheriting from `IExecutionFrame`. |
| `onnxruntime/core/optimizer/optimizer_execution_frame.cc` | Implements constructor and kernel lookup for optimization passes. |
| [`onnxruntime/core/framework/execution_frame.h`](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/framework/execution_frame.h) | Defines `IExecutionFrame`, allocation helpers, and the `GeneratePatterns` interface. |
| `onnxruntime/core/framework/execution_frame.cc` | Implements pattern generation, cached pattern retrieval, and buffer management. |
| [`onnxruntime/core/framework/ort_value_pattern_planner.h`](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/framework/ort_value_pattern_planner.h) | Defines `OrtValuePatternPlanner` that records allocations to build `MemoryPatternGroup`. |
| [`onnxruntime/core/framework/session_state.h`](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/framework/session_state.h) | Caches `MemoryPatternGroup` keyed by input shape hashes and exposes `GetMemoryPatternGroup`. |
| [`onnxruntime/core/framework/memory_pattern.h`](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/framework/memory_pattern.h) | Defines `MemoryPattern` and `MemoryPatternGroup` structures for offset-based allocation. |

## Summary

- **OptimizerExecutionFrame** reuses the standard `IExecutionFrame` infrastructure rather than implementing a custom planner, ensuring memory planning consistency between optimization and inference.
- **MemoryPatternGroup** objects are generated by `OrtValuePatternPlanner` during the first execution of a sub-graph and cached in `SessionState` indexed by input shape hashes.
- **Pre-allocated buffers** are managed per device, with individual tensors receiving offsets via `MemoryPattern::GetBlock`, eliminating per-tensor allocation overhead in subsequent runs.
- The allocation path through `GetOrCreateNodeOutputMLValue` automatically selects the cached pattern path when available, providing constant-folded and fused graphs with the same performance benefits as standard execution.

## Frequently Asked Questions

### Does OptimizerExecutionFrame implement its own memory planner?

No. According to the `microsoft/onnxruntime` source code, `OptimizerExecutionFrame` inherits from `IExecutionFrame` and relies on the `planner_` instance owned by the underlying `ExecutionFrame`. It does not maintain separate planning logic; instead, it consumes the same allocation plans and memory patterns used by standard inference execution.

### How are memory patterns cached between optimizer runs?

`SessionState` stores `MemoryPatternGroup` objects in an internal map keyed by the hash returned from `CalculateMemoryPatternsKey`. When an optimizer re-executes a sub-graph with identical input shapes, `GetMemoryPatternGroup` retrieves the existing pattern, allowing the `ExecutionFrame` to reuse the pre-allocated buffer offsets without regenerating the plan.

### What triggers the generation of a new MemoryPatternGroup?

A new pattern generates automatically when the `ExecutionFrame` detects no cached entry for the current input shape configuration. During the first `SequentialExecutor::Run` invocation, `GeneratePatterns` executes the graph once via `OrtValuePatternPlanner` to record allocation requirements and construct the `MemoryPatternGroup` for future reuse.

### Why does the optimizer frame use empty feeds during initialization?

Optimizers such as constant folding operate on static initializers and constant values already present in the graph rather than runtime input tensors. The `OptimizerExecutionFrame` constructor passes empty spans to `IExecutionFrame::Init` because the sub-graph inputs are sourced from `info.GetInitializers()`, not external feeds, ensuring the frame correctly maps constant tensors to kernel outputs.