# Performance Implications of Iterating Over Rows in a Pandas DataFrame: iterrows vs Alternatives

> Discover iterrows performance issues in Pandas DataFrames. Learn how itertuples and vectorized operations offer 5-100x faster alternatives for efficient data processing.

- Repository: [pandas/pandas](https://github.com/pandas-dev/pandas)
- Tags: performance
- Published: 2026-02-21

---

**`DataFrame.iterrows()` is significantly slower than alternatives because it constructs a new Series object for every row, while `itertuples()` provides 5–10× better performance by returning lightweight tuples, and vectorized operations can be 50–100× faster by leveraging compiled NumPy code.**

The pandas-dev/pandas library stores data in a **column-oriented** format optimized for vectorized operations. When you need row-wise access, understanding the performance implications of iterating over rows in a Pandas DataFrame becomes critical, as Python-level loops introduce substantial overhead compared to C-optimized array operations.

## Why Row Iteration Breaks Pandas' Performance Model

Pandas is built on NumPy arrays managed by an internal **block manager** that stores data column-wise. Accessing data row-by-row forces the library to traverse across columns, reconstructing values into Python objects. This departure from vectorized execution creates **O(n × c)** complexity where *n* is row count and *c* is column count, with significant constant overhead from object creation.

## Comparing Row Iteration Methods

### DataFrame.iterrows() — The Slow but Flexible Option

In [`pandas/core/frame.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py) (lines 1298–1306), `iterrows()` yields `(index, Series)` tuples for each row. The implementation builds a new **Series** object on-the-fly from the block manager data.

This method has two major performance penalties:

- **Memory copying**: Each iteration copies row data into a new Series
- **Type coercion**: Values are cast to the most generic dtype (often `object`), destroying original type information

The docstring explicitly warns: *"Note that dtypes may not be preserved across rows. Prefer `itertuples` for speed and type consistency."*

### DataFrame.itertuples() — The Performance-Optimized Iterator

Also defined in [`pandas/core/frame.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py) (lines 1356–1365), `itertuples()` returns **namedtuple** objects (or plain tuples) with fields matching column names. Instead of copying data, it packages existing values directly from the underlying NumPy arrays.

Key advantages:

- **O(n)** complexity with minimal constant overhead
- **Dtype preservation**: Values retain their original types
- **Memory efficiency**: No data copying occurs

The documentation notes it is *"generally faster and more type-stable than `iterrows`."*

### Vectorized Operations — The Fastest Approach

For maximum performance, avoid explicit row iteration entirely. Methods like `df.apply()` with `axis=1`, NumPy broadcasting, or arithmetic on whole columns execute in compiled C code without Python per-row overhead.

## Performance Benchmarks

The following benchmark demonstrates the practical difference between these approaches on a 100,000-row DataFrame:

```python
import pandas as pd
import numpy as np
import timeit

df = pd.DataFrame(
    np.random.randn(100_000, 10),
    columns=[f"c{i}" for i in range(10)]
)

def sum_iterrows():
    total = 0.0
    for _, row in df.iterrows():
        total += row.sum()
    return total

def sum_itertuples():
    total = 0.0
    for row in df.itertuples(index=False):
        total += sum(row)
    return total

def sum_vectorized():
    return df.values.sum()

print("iterrows :", timeit.timeit(sum_iterrows, number=1))
print("itertuples:", timeit.timeit(sum_itertuples, number=1))
print("vectorized:", timeit.timeit(sum_vectorized, number=1))

```

Typical execution times:

- **iterrows**: ~7.8 seconds
- **itertuples**: ~0.9 seconds  
- **vectorized**: ~0.04 seconds

## When to Use Each Method

- **Use `iterrows()`** only when you specifically need a Series view with index labels for interactive debugging or when working with heterogeneous data where row-wise Series operations simplify logic.
- **Use `itertuples()`** for production code requiring row-wise access, especially with numeric data where type preservation matters.
- **Use vectorized operations** for any performance-sensitive computation that can be expressed as column-wise arithmetic or aggregations.

## Summary

- **`iterrows()`** creates a new Series per row, resulting in **O(n × c)** complexity and type coercion to `object` dtype according to the source code in [`pandas/core/frame.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py).
- **`itertuples()`** accesses underlying arrays directly, providing **O(n)** complexity with **5–10× speed improvements** and full dtype preservation.
- **Vectorized operations** eliminate Python looping entirely, delivering **50–100× performance gains** over `iterrows()`.
- The pandas source code explicitly recommends `itertuples()` over `iterrows()` for speed and type consistency.

## Frequently Asked Questions

### Why is iterrows() so slow compared to itertuples()?

`iterrows()` constructs a new **Series** object for every row, which requires copying data from the block manager and casting values to a common dtype. `itertuples()` simply references existing values in the underlying NumPy arrays without copying, resulting in significantly lower overhead per iteration.

### Does itertuples() preserve DataFrame index information?

By default, `itertuples()` includes the index as the first field named `Index`. You can exclude it by passing `index=False`, which slightly improves performance when index values are not required for your computation.

### Can I modify DataFrame values while iterating with itertuples()?

No, `itertuples()` returns immutable tuples. For value modifications during iteration, you must collect changes in a separate data structure and assign them after the loop, or use `df.apply()` with a custom function that returns modified values.

### When is row iteration unavoidable in pandas?

Row iteration becomes necessary when processing logic requires considering multiple columns simultaneously in ways that cannot be expressed through vectorized operations, such as complex conditional logic or external API calls per row. Even then, `itertuples()` or `df.values` with NumPy loops outperform `iterrows()`.