How GraphTransformerMgr Applies Optimizations in Stages in ONNX Runtime
The GraphTransformerMgr drives graph‑level optimizations in ONNX Runtime through a three‑stage pipeline: registration by transformer level, multi‑pass execution with fixed‑point convergence, and state inspection to detect graph modifications.
This deep dive explores the GraphTransformerManager class in the Microsoft ONNX Runtime repository, the central orchestrator that applies graph transformations during inference session initialization. Understanding how this component stages optimizations—registration, iterative application, and modification tracking—helps developers debug optimization passes and implement custom transformers.
Stage 1: Transformer Registration by Optimization Level
Before any optimizations run, transformers must be registered with the manager. The GraphTransformerManager::Register method in onnxruntime/core/optimizer/graph_transformer_mgr.cc associates each transformer with a specific TransformerLevel (Level 1 through Level 4).
The registration logic stores transformers in two structures:
level_to_transformer_map_: A map organizing transformers by their assigned optimization level.transformers_info_: A name‑based lookup map for quick transformer retrieval.
// onnxruntime/core/optimizer/graph_transformer_mgr.cc (lines 63-74)
common::Status GraphTransformerManager::Register(std::unique_ptr<GraphTransformer> transformer,
TransformerLevel level) {
// ... validation logic ...
level_to_transformer_map_[level].push_back(std::move(transformer));
// ... name registration in transformers_info_ ...
return Status::OK();
}
Execution providers (EPs) and the inference session register their specific transformers during construction. For example, CPU‑specific transformers register for Level 2, while basic layout optimizations might target Level 1.
Stage 2: Multi-Pass Execution and Fixed-Point Convergence
When InferenceSession::Initialize prepares the model, it invokes GraphTransformerManager::ApplyTransformers sequentially for each level. This method implements the core execution loop that applies optimizations in stages.
The Execution Loop
The method signature in graph_transformer_mgr.cc (lines 25‑53) shows the interface:
common::Status GraphTransformerManager::ApplyTransformers(Graph& graph,
TransformerLevel level,
const logging::Logger& logger) const
The execution follows this controlled flow:
-
Reset State: Clears the internal
_is_graph_modifiedflag before processing begins. -
Level Lookup: Retrieves the vector of transformers for the requested
level. If no transformers exist for that level, returns immediately. -
Multi‑Pass Iteration: Performs up to
steps_passes over the transformer list (configurable via constructor orSetSteps, typically set to 5).
Within each pass, the manager:
- Checks Cancellation: Polls
IsLoadCancellationFlagSetto allow user‑initiated aborts without corrupting the graph. - Invokes Transformers: Calls each transformer's
Applymethod. - Tracks Modifications: If a transformer sets
modified = true, the manager marksgraph_changedand updates the global_is_graph_modified. - Respects Single‑Run Constraints: Skips transformers that declare
ShouldOnlyApplyOnce()on subsequent passes (lines 39‑41 in the source).
- Fixed‑Point Termination: After each pass, if no transformer modified the graph (
if (!graph_changed) break;), the loop exits early. This fixed‑point behavior prevents unnecessary iterations once the graph stabilizes.
// Conceptual representation of the execution loop
for (int step = 0; step < steps_; ++step) {
bool graph_changed = false;
for (auto& transformer : level_transformers) {
if (IsLoadCancellationFlagSet()) break;
bool modified = false;
ORT_RETURN_IF_ERROR(transformer->Apply(graph, modified, logger));
if (modified) {
graph_changed = true;
_is_graph_modified = true;
}
}
if (!graph_changed) break; // Fixed-point reached
}
Stage 3: Detecting Graph Modifications
After ApplyTransformers completes, higher‑level code must know whether the graph structure changed to determine if downstream steps (like kernel selection) need recomputation.
The manager exposes two methods in onnxruntime/core/optimizer/graph_transformer_mgr.cc (lines 55‑61):
IsGraphModified(): Returns theconst bool&reference to_is_graph_modified, indicating whether any transformer altered the graph during the current optimization phase.ClearGraphModified(): Resets the flag tofalse, typically called before applying a new level of transformers.
This state inspection allows the InferenceSession to optimize its initialization workflow, skipping expensive kernel re‑allocation when the graph remains unchanged.
Practical Implementation Examples
Registering and Applying a Custom Transformer
This example demonstrates creating a manager, registering a custom fusion transformer for Level 2, and applying it:
#include "onnxruntime/core/optimizer/graph_transformer_mgr.h"
#include "onnxruntime/core/optimizer/graph_transformer.h"
class MyFusion : public onnxruntime::GraphTransformer {
public:
MyFusion() : GraphTransformer("MyFusion") {}
common::Status Apply(onnxruntime::Graph& graph,
bool& modified,
const onnxruntime::logging::Logger& logger) const override {
// ... fusion logic ...
modified = true; // Set if graph was altered
return onnxruntime::common::Status::OK();
}
};
// Initialize manager with 5 passes (default)
onnxruntime::GraphTransformerManager mgr(/*steps=*/5);
// Register for Level 2 optimizations
auto transformer = std::make_unique<MyFusion>();
mgr.Register(std::move(transformer), onnxruntime::TransformerLevel::Level2);
// Apply to graph
onnxruntime::Graph graph = /* ... */;
onnxruntime::logging::Logger logger = /* ... */;
mgr.ApplyTransformers(graph, onnxruntime::TransformerLevel::Level2, logger);
Session Integration Pattern
The following pattern from InferenceSession shows how the manager orchestrates optimization levels sequentially:
GraphTransformerManager graph_mgr(/*steps=*/5);
// Register EP-specific transformers (e.g., CPU EP)
ort::cpu::RegisterCpuGraphTransformers(graph_mgr);
// Apply each level in order
for (auto level : {TransformerLevel::Level1,
TransformerLevel::Level2,
TransformerLevel::Level3,
TransformerLevel::Level4}) {
ORT_RETURN_IF_ERROR(graph_mgr.ApplyTransformers(graph_, level, logger_));
// Check if graph changed before proceeding
if (graph_mgr.IsGraphModified()) {
// ... trigger kernel re-selection ...
}
graph_mgr.ClearGraphModified();
}
Summary
- Registration Stage: Transformers are indexed by
TransformerLevelinlevel_to_transformer_map_viaGraphTransformerManager::Register, enabling level‑specific optimization strategies. - Execution Stage:
ApplyTransformersruns up tosteps_passes, checkingIsLoadCancellationFlagSetfor aborts, respectingShouldOnlyApplyOnceconstraints, and breaking early when no modifications occur (fixed‑point convergence). - State Inspection Stage: The
_is_graph_modifiedflag, accessed viaIsGraphModified(), reports whether the graph changed, allowingInferenceSessionto conditionally recompute kernel assignments. - Bounded Cost: The combination of configurable
steps_and fixed‑point termination ensures optimization costs remain predictable even with aggressive transformer chains.
Frequently Asked Questions
How does GraphTransformerMgr handle user cancellation during optimization?
The manager checks IsLoadCancellationFlagSet() at the start of each pass through the transformer list. If the flag is set, the optimization loop breaks immediately, returning control to the caller without completing remaining passes. This prevents wasted computation when a user aborts model loading.
What is the purpose of the steps_ parameter in GraphTransformerManager?
The steps_ parameter (set via constructor or SetSteps) defines the maximum number of passes the manager executes over the transformer list for a given level. While the default is typically 5, the fixed‑point detection (if (!graph_changed) break;) usually terminates earlier once no further optimizations apply, bounding the total work while allowing iterative transformations to stabilize.
Why do some transformers only apply once per level?
Transformers that return true from ShouldOnlyApplyOnce() are skipped on subsequent passes within the same ApplyTransformers call. This optimization prevents redundant processing for transformations that deterministically modify the graph in a single pass, improving performance without affecting correctness.
How does the manager communicate that optimizations changed the graph?
After each transformer invocation, the manager checks the modified output parameter. If any transformer modifies the graph, the internal _is_graph_modified flag is set to true. Callers query this state via IsGraphModified() after ApplyTransformers returns to determine if downstream initialization steps (such as kernel allocation) must be re-executed.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →