# Understanding the Relationship Between OrtValue, OrtTensor, and OrtMemoryInfo in ONNX Runtime

> Explore the ONNX Runtime relationship between OrtValue OrtTensor and OrtMemoryInfo. Learn how OrtValue containers hold tensors and OrtMemoryInfo defines their memory location.

- Repository: [Microsoft/onnxruntime](https://github.com/microsoft/onnxruntime)
- Tags: internals
- Published: 2026-04-24

---

**OrtValue is the opaque container that wraps data in ONNX Runtime, holding a concrete `onnxruntime::Tensor` when the data is a dense tensor, while `OrtMemoryInfo` defines where that tensor's buffer lives (CPU/GPU, allocator type) and is stored as metadata within the Tensor class.**

In the **microsoft/onnxruntime** inference engine, data flows through the execution graph as opaque handles that must simultaneously support multiple data types and memory locations. The repository defines a three-layer architecture comprising `OrtValue` (the generic container), `onnxruntime::Tensor` (the concrete tensor implementation often referenced as OrtTensor in documentation), and `OrtMemoryInfo` (the memory location descriptor). Understanding how these types interact is essential for custom execution providers, memory optimization, and cross-device data transfers.

## The OrtValue Container

**OrtValue** is the opaque data handle used throughout the ONNX Runtime C API (`OrtValue*`) and C++ internals. Defined in [`include/onnxruntime/core/framework/ort_value.h`](https://github.com/microsoft/onnxruntime/blob/main/include/onnxruntime/core/framework/ort_value.h), this class stores data via a `std::shared_ptr<void>` that can point to various underlying types including `Tensor`, `TensorSeq`, `SparseTensor`, or maps. When an OrtValue holds tensor data, it acts as a type-erased wrapper around the concrete `onnxruntime::Tensor` class.

## Inside OrtValue: The Tensor Implementation

When an OrtValue contains dense tensor data, the actual implementation is the **`onnxruntime::Tensor`** class defined in [`include/onnxruntime/core/framework/tensor.h`](https://github.com/microsoft/onnxruntime/blob/main/include/onnxruntime/core/framework/tensor.h). Unlike high-level tensor libraries, **Tensor does not allocate its own memory**. Instead, it receives a raw data pointer from an external allocator and tracks only the metadata: element type, shape, and memory location details.

### Memory Ownership Model

The Tensor class constructor accepts a pointer to pre-allocated memory and an `OrtMemoryInfo` object describing that memory's provenance. As implemented in [`tensor.h`](https://github.com/microsoft/onnxruntime/blob/main/tensor.h), the Tensor stores this information in a private member `alloc_info_`, which is exposed through the public method `Tensor::Location()`. This design allows tensors to reference memory owned by CPU allocators, CUDA allocators, or custom execution provider buffers without taking ownership of the underlying allocation.

## OrtMemoryInfo as the Memory Descriptor

**OrtMemoryInfo** is a plain-old-data struct defined in [`include/onnxruntime/core/framework/ortmemoryinfo.h`](https://github.com/microsoft/onnxruntime/blob/main/include/onnxruntime/core/framework/ortmemoryinfo.h) that uniquely identifies a memory location through four key fields:

- **Device type**: CPU, GPU, or custom
- **Memory type**: Default, pinned, etc.
- **Allocator type**: Device vs arena
- **Optional name**: Identifier for the specific allocator

This struct serves as the glue between the tensor data buffer and the allocator infrastructure, enabling the runtime to determine when data transfers are necessary between execution providers.

## Implementation Details: How They Connect

The relationship forms a clear hierarchy: OrtValue owns a shared_ptr to a Tensor, and the Tensor owns an OrtMemoryInfo describing its buffer. When you create a tensor using `Tensor::InitOrtValue()`, you pass the OrtMemoryInfo explicitly, binding the tensor to a specific allocator context.

### Retrieving Memory Info via the C API

The C API function **`OrtApi::GetTensorMemoryInfo`** (declared in [`include/onnxruntime/core/session/onnxruntime_c_api.h`](https://github.com/microsoft/onnxruntime/blob/main/include/onnxruntime/core/session/onnxruntime_c_api.h) and implemented in `core/session/ort_apis.cc`) extracts the memory information from an OrtValue. The implementation simply forwards to `Tensor::Location()`, returning the `OrtMemoryInfo` pointer stored within the Tensor's `alloc_info_` member.

## Practical Examples

The following examples demonstrate how these three components interact in real code.

### Creating a Tensor with OrtMemoryInfo (C++)

```cpp
// ---------------------------------------------------
// 1️⃣ Create a Tensor on the CPU and wrap it in an OrtValue
// ---------------------------------------------------
#include "core/framework/tensor.h"
#include "core/framework/ortmemoryinfo.h"
#include "core/session/onnxruntime_c_api.h"

OrtMemoryInfo cpu_mem_info("CpuAllocator",
                           OrtDeviceAllocator,          // allocator type
                           OrtDevice(OrtDevice::CPU),  // default device
                           OrtMemTypeDefault);         // memory type

// allocate a simple 1‑D tensor of 4 floats
std::vector<int64_t> shape = {4};
auto* p_data = malloc(4 * sizeof(float));   // raw buffer owned by us
auto tensor = onnxruntime::Tensor(
    onnxruntime::DataTypeImpl::GetTensorType<float>(),
    onnxruntime::TensorShape(shape),
    p_data,
    cpu_mem_info);               // <-- memory info attached to the tensor

OrtValue ort_val;                // empty container
tensor.InitOrtValue(ort_val);   // store tensor inside the OrtValue

// ---------------------------------------------------
// 2️⃣ Retrieve the memory info via the C‑API
// ---------------------------------------------------
const OrtMemoryInfo* mi = nullptr;
OrtStatus* status = OrtApi::GetTensorMemoryInfo(ort_api_, &ort_val, &mi);
if (status == nullptr) {
    std::cout << "Tensor lives on device type: " << mi->device.Type()
              << " (allocator = " << static_cast<int>(mi->alloc_type) << ")\n";
}

```

### Querying Memory Location from Python

```python

# ---------------------------------------------------

# 3️⃣ Same idea from Python – inspect memory info of an output

# ---------------------------------------------------

import onnxruntime as ort
sess = ort.InferenceSession("model.onnx")
outputs = sess.run(None, {"input": [[1.0, 2.0, 3.0, 4.0]]})

# each output is an OrtValue‑like ndarray; we can query its memory info

info = sess.get_output_memory_info(0)   # C‑API wrapper

print("output 0 lives on:", info.device_name)   # e.g. "CPU"

```

### Transferring Between Devices (Execution Provider)

```cpp
// ---------------------------------------------------
// 4️⃣ Using memory info in an Execution Provider (EP)
// ---------------------------------------------------
#include "core/providers/cuda/cuda_data_transfer.h"

void TransferTensor(const onnxruntime::Tensor& src,
                    onnxruntime::Tensor& dst) {
  const OrtMemoryInfo* src_info = src.GetTensorMemoryInfo();
  const OrtMemoryInfo* dst_info = dst.GetTensorMemoryInfo();

  // Decide whether a GPU‑to‑CPU copy is required
  bool src_is_gpu = src_info->device.Type() == OrtDevice::GPU;
  bool dst_is_gpu = dst_info->device.Type() == OrtDevice::GPU;
  // ... perform appropriate copy
}

```

## Summary

The relationship between these three core types follows a strict containment hierarchy:

- **OrtValue** acts as the universal, type-erased container visible to both C API users and C++ internals, capable of holding tensors, sequences, maps, or sparse tensors.
- **onnxruntime::Tensor** provides the concrete implementation for dense tensor data, storing shape, element type, and a raw data pointer, but **does not own the memory allocation**.
- **OrtMemoryInfo** describes the allocator characteristics and device location, stored within the Tensor class as `alloc_info_` and accessible via `Tensor::Location()` or the C API `GetTensorMemoryInfo`.

This architecture enables ONNX Runtime to manage complex, multi-device execution graphs while maintaining clear ownership boundaries between data containers, memory allocators, and execution providers.

## Frequently Asked Questions

### What is the difference between OrtValue and onnxruntime::Tensor?

**OrtValue** is a generic, opaque handle that can contain any ONNX data type (tensors, sequences, maps). When it contains dense tensor data, it internally holds a `std::shared_ptr` to a **`onnxruntime::Tensor`** object, which provides the specific implementation for tensor operations and memory layout. The Tensor class is not exposed directly in the C API; instead, users interact with OrtValue handles and query tensor-specific properties through C API functions.

### How do I retrieve OrtMemoryInfo from an existing OrtValue?

Use the C API function **`GetTensorMemoryInfo`** through the OrtApi interface (declared in [`onnxruntime_c_api.h`](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime_c_api.h) and implemented in `ort_apis.cc`). This function extracts the memory information from the Tensor stored inside the OrtValue by calling `Tensor::Location()` internally. In C++, you can also access this directly if you have the Tensor object by calling `tensor.GetTensorMemoryInfo()` or `tensor.Location()`.

### Can OrtValue contain data types other than dense tensors?

Yes. OrtValue is designed to wrap any ONNX type supported by the runtime, including **`TensorSeq`** (sequences of tensors), **`SparseTensor`**, and map types. The `OrtMemoryInfo` query only applies when the OrtValue actually contains a dense tensor; attempting to query memory info on other types will return an error status.

### Why does the Tensor class not allocate its own memory?

This design decouples the tensor metadata from memory management, allowing the **`Tensor`** class to reference buffers allocated by diverse execution providers (CUDA, DirectML, ROCm, custom) without taking ownership. The **OrtMemoryInfo** struct tracks which allocator owns the buffer, enabling zero-copy data transfers and efficient memory pooling where execution providers can reuse buffers across inference runs without unnecessary allocation overhead.