Understanding the Relationship Between OrtValue, OrtTensor, and OrtMemoryInfo in ONNX Runtime

OrtValue is the opaque container that wraps data in ONNX Runtime, holding a concrete onnxruntime::Tensor when the data is a dense tensor, while OrtMemoryInfo defines where that tensor's buffer lives (CPU/GPU, allocator type) and is stored as metadata within the Tensor class.

In the microsoft/onnxruntime inference engine, data flows through the execution graph as opaque handles that must simultaneously support multiple data types and memory locations. The repository defines a three-layer architecture comprising OrtValue (the generic container), onnxruntime::Tensor (the concrete tensor implementation often referenced as OrtTensor in documentation), and OrtMemoryInfo (the memory location descriptor). Understanding how these types interact is essential for custom execution providers, memory optimization, and cross-device data transfers.

The OrtValue Container

OrtValue is the opaque data handle used throughout the ONNX Runtime C API (OrtValue*) and C++ internals. Defined in include/onnxruntime/core/framework/ort_value.h, this class stores data via a std::shared_ptr<void> that can point to various underlying types including Tensor, TensorSeq, SparseTensor, or maps. When an OrtValue holds tensor data, it acts as a type-erased wrapper around the concrete onnxruntime::Tensor class.

Inside OrtValue: The Tensor Implementation

When an OrtValue contains dense tensor data, the actual implementation is the onnxruntime::Tensor class defined in include/onnxruntime/core/framework/tensor.h. Unlike high-level tensor libraries, Tensor does not allocate its own memory. Instead, it receives a raw data pointer from an external allocator and tracks only the metadata: element type, shape, and memory location details.

Memory Ownership Model

The Tensor class constructor accepts a pointer to pre-allocated memory and an OrtMemoryInfo object describing that memory's provenance. As implemented in tensor.h, the Tensor stores this information in a private member alloc_info_, which is exposed through the public method Tensor::Location(). This design allows tensors to reference memory owned by CPU allocators, CUDA allocators, or custom execution provider buffers without taking ownership of the underlying allocation.

OrtMemoryInfo as the Memory Descriptor

OrtMemoryInfo is a plain-old-data struct defined in include/onnxruntime/core/framework/ortmemoryinfo.h that uniquely identifies a memory location through four key fields:

  • Device type: CPU, GPU, or custom
  • Memory type: Default, pinned, etc.
  • Allocator type: Device vs arena
  • Optional name: Identifier for the specific allocator

This struct serves as the glue between the tensor data buffer and the allocator infrastructure, enabling the runtime to determine when data transfers are necessary between execution providers.

Implementation Details: How They Connect

The relationship forms a clear hierarchy: OrtValue owns a shared_ptr to a Tensor, and the Tensor owns an OrtMemoryInfo describing its buffer. When you create a tensor using Tensor::InitOrtValue(), you pass the OrtMemoryInfo explicitly, binding the tensor to a specific allocator context.

Retrieving Memory Info via the C API

The C API function OrtApi::GetTensorMemoryInfo (declared in include/onnxruntime/core/session/onnxruntime_c_api.h and implemented in core/session/ort_apis.cc) extracts the memory information from an OrtValue. The implementation simply forwards to Tensor::Location(), returning the OrtMemoryInfo pointer stored within the Tensor's alloc_info_ member.

Practical Examples

The following examples demonstrate how these three components interact in real code.

Creating a Tensor with OrtMemoryInfo (C++)

// ---------------------------------------------------
// 1️⃣ Create a Tensor on the CPU and wrap it in an OrtValue
// ---------------------------------------------------
#include "core/framework/tensor.h"
#include "core/framework/ortmemoryinfo.h"
#include "core/session/onnxruntime_c_api.h"

OrtMemoryInfo cpu_mem_info("CpuAllocator",
                           OrtDeviceAllocator,          // allocator type
                           OrtDevice(OrtDevice::CPU),  // default device
                           OrtMemTypeDefault);         // memory type

// allocate a simple 1‑D tensor of 4 floats
std::vector<int64_t> shape = {4};
auto* p_data = malloc(4 * sizeof(float));   // raw buffer owned by us
auto tensor = onnxruntime::Tensor(
    onnxruntime::DataTypeImpl::GetTensorType<float>(),
    onnxruntime::TensorShape(shape),
    p_data,
    cpu_mem_info);               // <-- memory info attached to the tensor

OrtValue ort_val;                // empty container
tensor.InitOrtValue(ort_val);   // store tensor inside the OrtValue

// ---------------------------------------------------
// 2️⃣ Retrieve the memory info via the C‑API
// ---------------------------------------------------
const OrtMemoryInfo* mi = nullptr;
OrtStatus* status = OrtApi::GetTensorMemoryInfo(ort_api_, &ort_val, &mi);
if (status == nullptr) {
    std::cout << "Tensor lives on device type: " << mi->device.Type()
              << " (allocator = " << static_cast<int>(mi->alloc_type) << ")\n";
}

Querying Memory Location from Python


# ---------------------------------------------------

# 3️⃣ Same idea from Python – inspect memory info of an output

# ---------------------------------------------------

import onnxruntime as ort
sess = ort.InferenceSession("model.onnx")
outputs = sess.run(None, {"input": [[1.0, 2.0, 3.0, 4.0]]})

# each output is an OrtValue‑like ndarray; we can query its memory info

info = sess.get_output_memory_info(0)   # C‑API wrapper

print("output 0 lives on:", info.device_name)   # e.g. "CPU"

Transferring Between Devices (Execution Provider)

// ---------------------------------------------------
// 4️⃣ Using memory info in an Execution Provider (EP)
// ---------------------------------------------------
#include "core/providers/cuda/cuda_data_transfer.h"

void TransferTensor(const onnxruntime::Tensor& src,
                    onnxruntime::Tensor& dst) {
  const OrtMemoryInfo* src_info = src.GetTensorMemoryInfo();
  const OrtMemoryInfo* dst_info = dst.GetTensorMemoryInfo();

  // Decide whether a GPU‑to‑CPU copy is required
  bool src_is_gpu = src_info->device.Type() == OrtDevice::GPU;
  bool dst_is_gpu = dst_info->device.Type() == OrtDevice::GPU;
  // ... perform appropriate copy
}

Summary

The relationship between these three core types follows a strict containment hierarchy:

  • OrtValue acts as the universal, type-erased container visible to both C API users and C++ internals, capable of holding tensors, sequences, maps, or sparse tensors.
  • onnxruntime::Tensor provides the concrete implementation for dense tensor data, storing shape, element type, and a raw data pointer, but does not own the memory allocation.
  • OrtMemoryInfo describes the allocator characteristics and device location, stored within the Tensor class as alloc_info_ and accessible via Tensor::Location() or the C API GetTensorMemoryInfo.

This architecture enables ONNX Runtime to manage complex, multi-device execution graphs while maintaining clear ownership boundaries between data containers, memory allocators, and execution providers.

Frequently Asked Questions

What is the difference between OrtValue and onnxruntime::Tensor?

OrtValue is a generic, opaque handle that can contain any ONNX data type (tensors, sequences, maps). When it contains dense tensor data, it internally holds a std::shared_ptr to a onnxruntime::Tensor object, which provides the specific implementation for tensor operations and memory layout. The Tensor class is not exposed directly in the C API; instead, users interact with OrtValue handles and query tensor-specific properties through C API functions.

How do I retrieve OrtMemoryInfo from an existing OrtValue?

Use the C API function GetTensorMemoryInfo through the OrtApi interface (declared in onnxruntime_c_api.h and implemented in ort_apis.cc). This function extracts the memory information from the Tensor stored inside the OrtValue by calling Tensor::Location() internally. In C++, you can also access this directly if you have the Tensor object by calling tensor.GetTensorMemoryInfo() or tensor.Location().

Can OrtValue contain data types other than dense tensors?

Yes. OrtValue is designed to wrap any ONNX type supported by the runtime, including TensorSeq (sequences of tensors), SparseTensor, and map types. The OrtMemoryInfo query only applies when the OrtValue actually contains a dense tensor; attempting to query memory info on other types will return an error status.

Why does the Tensor class not allocate its own memory?

This design decouples the tensor metadata from memory management, allowing the Tensor class to reference buffers allocated by diverse execution providers (CUDA, DirectML, ROCm, custom) without taking ownership. The OrtMemoryInfo struct tracks which allocator owns the buffer, enabling zero-copy data transfers and efficient memory pooling where execution providers can reuse buffers across inference runs without unnecessary allocation overhead.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →