how-to-guide

ONNX Runtime Profiling Tools: A Complete Guide to Performance Analysis

April 24, 2026 microsoft/onnxruntime ↗

ONNX Runtime provides a layered profiling system that captures detailed timing data in Chrome trace format, enabling performance analysis at the session, operator, and execution-provider levels.

The microsoft/onnxruntime repository ships with comprehensive profiling tools designed to identify bottlenecks in machine learning inference. These utilities generate standardized JSON traces compatible with Chrome's performance visualizer, allowing developers to inspect CPU operator latency, GPU kernel execution, and memory copy overhead within a single timeline.

Core Profiler Architecture

The Core Profiler serves as the foundation of ONNX Runtime's performance instrumentation. Located in [onnxruntime/core/common/profiler.h](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/common/profiler.h) and profiler.cc, this thread-safe collector records high-resolution timestamps for session lifecycle events and individual node execution.

When enabled, the profiler initializes a TimePoint profiling_start_time_ via Profiler::Start() and accumulates events in an internal vector. Each operator invokes EndTimeAndRecordEvent with specific categories (e.g., ORT_PROFILING_EVENT_CATEGORY_NODE), which serialize to Chrome trace JSON upon session destruction or explicit EndProfiling() calls. This design ensures minimal runtime overhead while capturing nanosecond-precision metrics.

Execution Provider Profilers

Beyond the core timing infrastructure, Execution Provider (EP) Profilers inject hardware-specific events into the same trace stream. These implementations reside in provider-specific subdirectories and automatically activate when profiling is enabled globally.

CUDA EP: [onnxruntime/core/providers/cuda/cuda_profiler.h](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/cuda/cuda_profiler.h) emits NVTX markers for kernel launches and CudaMemcpyHtoD operations, tagged with the "CUDA" category.
WebGPU EP: [onnxruntime/core/providers/webgpu/webgpu_profiler.h](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/webgpu/webgpu_profiler.h) captures GPU command buffer execution timings.
Vitis AI EP: [onnxruntime/core/providers/vitisai/vitisai_profiler.h](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/vitisai/vitisai_profiler.h) profiles FPGA-specific acceleration events.

No additional user configuration is required; the runtime automatically instantiates these profilers via EpProfiler::StartProfiling when the corresponding EP is loaded and session-level profiling is active.

Plugin EP Event API

For third-party execution providers loaded through the plugin mechanism, the Plugin EP Event API defined in [onnxruntime/core/session/plugin_ep/ep_event_profiling.h](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/session/plugin_ep/ep_event_profiling.h) exposes opaque OrtProfilingEvent structures. Custom EP authors push timing data using OrtEpApi::ProfilingEventsContainer_AddEvents, merging external performance data into the core profiler's JSON output without recompiling the ONNX Runtime binary.

Python Profiling Utilities

The repository includes a high-level Python tool for transformer model analysis. Located at [onnxruntime/python/tools/transformers/profiler.py](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/profiler.py), this script orchestrates multiple inference runs across varying batch sizes and sequence lengths, producing aggregated statistics alongside the standard Chrome trace.

The accompanying test suite in [onnxruntime/test/python/transformers/test_profiler.py](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/python/transformers/test_profiler.py) demonstrates usage patterns for benchmarking BERT-like architectures.

How to Enable Profiling in ONNX Runtime

Profiling requires minimal code changes across supported APIs. The system generates <prefix>_profile.json files consumable by chrome://tracing or compatible visualization tools.

C++ API

Use SessionOptions::EnableProfiling() to activate collection before session creation:

#include "onnxruntime/core/session/onnxruntime_cxx_api.h"

int main() {
  Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "prof_example");
  Ort::SessionOptions sess_opts;
  
  // Enable profiling with filename prefix "myrun_"
  sess_opts.EnableProfiling("myrun_");
  
  Ort::Session session(env, "model.onnx", sess_opts);
  
  // Inference runs automatically populate the profile
  session.Run(Ort::RunOptions{nullptr}, input_names.data(),
              input_tensors.data(), input_names.size(),
              output_names.data(), output_names.size());
              
  // Profile flushes to myrun_profile.json on session destruction
}

The EnableProfiling method wraps the underlying OrtApi::EnableProfiling call defined in [onnxruntime/core/session/ort_apis.h](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/session/ort_apis.h).

Python API

In the Python bindings, set the enable_profiling attribute on SessionOptions:

import onnxruntime as ort

sess_opts = ort.SessionOptions()
sess_opts.enable_profiling = True  # Uses default prefix, or specify string

session = ort.InferenceSession("model.onnx", sess_opts)
outputs = session.run(None, {"input": input_tensor})

# Explicitly retrieve the generated file path

profile_path = session.end_profiling()
print(f"Profile written to: {profile_path}")

CUDA EP Specifics

When profiling CUDA-enabled sessions, GPU kernel events appear automatically in the trace output. The CUDA EP profiler captures NVTX ranges for memory transfers and kernel executions, allowing identification of CPU-GPU synchronization overhead without additional instrumentation code.

Summary

The Core Profiler in onnxruntime/core/common/profiler.h provides thread-safe event collection with nanosecond precision.
Execution Provider Profilers for CUDA, WebGPU, and Vitis AI inject hardware-specific metrics into the same JSON trace.
The Plugin EP Event API enables third-party providers to contribute timing data through opaque C structures.
Python utilities in onnxruntime/python/tools/transformers/profiler.py simplify model-wide benchmarking.
All profilers output Chrome-compatible JSON viewable in chrome://tracing for visual latency analysis.

Frequently Asked Questions

How do I visualize the profiling output from ONNX Runtime?

Open the generated *_profile.json file in Chrome's tracing viewer by navigating to chrome://tracing and loading the file. The timeline displays nested events categorized by execution provider (CPU, CUDA, WebGPU), allowing you to identify operator-level bottlenecks and memory transfer delays.

Can I profile custom execution providers without modifying the core library?

Yes. Custom execution providers loaded via the plugin mechanism can push profiling events using the Plugin EP Event API defined in ep_event_profiling.h. Call OrtEpApi::ProfilingEventsContainer_AddEvents on the opaque container supplied by the runtime to inject your timing data into the standard output format.

What is the performance overhead of enabling profiling in ONNX Runtime?

The profiler introduces minimal overhead during inference because it records only high-resolution timestamps and event metadata. However, the final JSON serialization in Profiler::EndProfiling() may cause a brief pause when the session ends or end_profiling() is called, as it flushes the accumulated event vector to disk.

How do I profile GPU-specific operations like kernel launches?

Enable profiling at the session level using the standard API calls. The CUDA EP Profiler automatically captures NVTX events for kernel launches, memory copies, and synchronization points. These appear in the trace with the "CUDA" category, distinct from CPU "Node" events, requiring no additional code beyond enabling session profiling.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how microsoft/onnxruntime works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →