ONNX Runtime Profiling Tools: A Complete Guide to Performance Analysis
ONNX Runtime provides a layered profiling system that captures detailed timing data in Chrome trace format, enabling performance analysis at the session, operator, and execution-provider levels.
The microsoft/onnxruntime repository ships with comprehensive profiling tools designed to identify bottlenecks in machine learning inference. These utilities generate standardized JSON traces compatible with Chrome's performance visualizer, allowing developers to inspect CPU operator latency, GPU kernel execution, and memory copy overhead within a single timeline.
Core Profiler Architecture
The Core Profiler serves as the foundation of ONNX Runtime's performance instrumentation. Located in [onnxruntime/core/common/profiler.h](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/common/profiler.h) and profiler.cc, this thread-safe collector records high-resolution timestamps for session lifecycle events and individual node execution.
When enabled, the profiler initializes a TimePoint profiling_start_time_ via Profiler::Start() and accumulates events in an internal vector. Each operator invokes EndTimeAndRecordEvent with specific categories (e.g., ORT_PROFILING_EVENT_CATEGORY_NODE), which serialize to Chrome trace JSON upon session destruction or explicit EndProfiling() calls. This design ensures minimal runtime overhead while capturing nanosecond-precision metrics.
Execution Provider Profilers
Beyond the core timing infrastructure, Execution Provider (EP) Profilers inject hardware-specific events into the same trace stream. These implementations reside in provider-specific subdirectories and automatically activate when profiling is enabled globally.
- CUDA EP: [
onnxruntime/core/providers/cuda/cuda_profiler.h](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/cuda/cuda_profiler.h) emits NVTX markers for kernel launches andCudaMemcpyHtoDoperations, tagged with the "CUDA" category. - WebGPU EP: [
onnxruntime/core/providers/webgpu/webgpu_profiler.h](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/webgpu/webgpu_profiler.h) captures GPU command buffer execution timings. - Vitis AI EP: [
onnxruntime/core/providers/vitisai/vitisai_profiler.h](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/vitisai/vitisai_profiler.h) profiles FPGA-specific acceleration events.
No additional user configuration is required; the runtime automatically instantiates these profilers via EpProfiler::StartProfiling when the corresponding EP is loaded and session-level profiling is active.
Plugin EP Event API
For third-party execution providers loaded through the plugin mechanism, the Plugin EP Event API defined in [onnxruntime/core/session/plugin_ep/ep_event_profiling.h](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/session/plugin_ep/ep_event_profiling.h) exposes opaque OrtProfilingEvent structures. Custom EP authors push timing data using OrtEpApi::ProfilingEventsContainer_AddEvents, merging external performance data into the core profiler's JSON output without recompiling the ONNX Runtime binary.
Python Profiling Utilities
The repository includes a high-level Python tool for transformer model analysis. Located at [onnxruntime/python/tools/transformers/profiler.py](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/profiler.py), this script orchestrates multiple inference runs across varying batch sizes and sequence lengths, producing aggregated statistics alongside the standard Chrome trace.
The accompanying test suite in [onnxruntime/test/python/transformers/test_profiler.py](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/python/transformers/test_profiler.py) demonstrates usage patterns for benchmarking BERT-like architectures.
How to Enable Profiling in ONNX Runtime
Profiling requires minimal code changes across supported APIs. The system generates <prefix>_profile.json files consumable by chrome://tracing or compatible visualization tools.
C++ API
Use SessionOptions::EnableProfiling() to activate collection before session creation:
#include "onnxruntime/core/session/onnxruntime_cxx_api.h"
int main() {
Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "prof_example");
Ort::SessionOptions sess_opts;
// Enable profiling with filename prefix "myrun_"
sess_opts.EnableProfiling("myrun_");
Ort::Session session(env, "model.onnx", sess_opts);
// Inference runs automatically populate the profile
session.Run(Ort::RunOptions{nullptr}, input_names.data(),
input_tensors.data(), input_names.size(),
output_names.data(), output_names.size());
// Profile flushes to myrun_profile.json on session destruction
}
The EnableProfiling method wraps the underlying OrtApi::EnableProfiling call defined in [onnxruntime/core/session/ort_apis.h](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/session/ort_apis.h).
Python API
In the Python bindings, set the enable_profiling attribute on SessionOptions:
import onnxruntime as ort
sess_opts = ort.SessionOptions()
sess_opts.enable_profiling = True # Uses default prefix, or specify string
session = ort.InferenceSession("model.onnx", sess_opts)
outputs = session.run(None, {"input": input_tensor})
# Explicitly retrieve the generated file path
profile_path = session.end_profiling()
print(f"Profile written to: {profile_path}")
CUDA EP Specifics
When profiling CUDA-enabled sessions, GPU kernel events appear automatically in the trace output. The CUDA EP profiler captures NVTX ranges for memory transfers and kernel executions, allowing identification of CPU-GPU synchronization overhead without additional instrumentation code.
Summary
- The Core Profiler in
onnxruntime/core/common/profiler.hprovides thread-safe event collection with nanosecond precision. - Execution Provider Profilers for CUDA, WebGPU, and Vitis AI inject hardware-specific metrics into the same JSON trace.
- The Plugin EP Event API enables third-party providers to contribute timing data through opaque C structures.
- Python utilities in
onnxruntime/python/tools/transformers/profiler.pysimplify model-wide benchmarking. - All profilers output Chrome-compatible JSON viewable in
chrome://tracingfor visual latency analysis.
Frequently Asked Questions
How do I visualize the profiling output from ONNX Runtime?
Open the generated *_profile.json file in Chrome's tracing viewer by navigating to chrome://tracing and loading the file. The timeline displays nested events categorized by execution provider (CPU, CUDA, WebGPU), allowing you to identify operator-level bottlenecks and memory transfer delays.
Can I profile custom execution providers without modifying the core library?
Yes. Custom execution providers loaded via the plugin mechanism can push profiling events using the Plugin EP Event API defined in ep_event_profiling.h. Call OrtEpApi::ProfilingEventsContainer_AddEvents on the opaque container supplied by the runtime to inject your timing data into the standard output format.
What is the performance overhead of enabling profiling in ONNX Runtime?
The profiler introduces minimal overhead during inference because it records only high-resolution timestamps and event metadata. However, the final JSON serialization in Profiler::EndProfiling() may cause a brief pause when the session ends or end_profiling() is called, as it flushes the accumulated event vector to disk.
How do I profile GPU-specific operations like kernel launches?
Enable profiling at the session level using the standard API calls. The CUDA EP Profiler automatically captures NVTX events for kernel launches, memory copies, and synchronization points. These appear in the trace with the "CUDA" category, distinct from CPU "Node" events, requiring no additional code beyond enabling session profiling.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →