# How to Add Row to DataFrame Pandas: Best Practices for Large Datasets

> Efficiently add row to DataFrame pandas for large datasets. Discover best practices to avoid costly rebuilds and optimize performance.

- Repository: [pandas/pandas](https://github.com/pandas-dev/pandas)
- Tags: best-practices
- Published: 2026-02-16

---

**The most efficient way to add row to DataFrame pandas is to accumulate data in a list or small DataFrames and perform a single construction or concatenation, avoiding repeated row-wise appends that trigger expensive BlockManager rebuilds.**

When working with large datasets in the `pandas-dev/pandas` repository, understanding the internal data structure is crucial for performance. Pandas stores column data in contiguous NumPy arrays called **blocks**, managed by the low-level `BlockManager`. Each time you append a single row, pandas must reconstruct this entire manager, leading to O(n) memory copies that cripple performance at scale.

## Why Row-by-Row Append Is Slow in Pandas

The naive approach of repeatedly calling `df = df.append(row)` (now deprecated) or similar row-wise methods triggers a deep internal rebuild. In [`pandas/core/frame.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py), the private method `_append_internal` handles row appends by forwarding operations to the generic concatenation engine in [`pandas/core/reshape/concat.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/reshape/concat.py)【source】(https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py#L14335).

This concatenation creates a **brand-new DataFrame** from the operands, allocating fresh memory and copying all existing data. When performed in a loop, this results in quadratic time complexity—every iteration copies an ever-growing DataFrame.

The actual block-level concatenation occurs in [`pandas/core/internals/managers.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/internals/managers.py) within `concatenate_managers`【source】(https://github.com/pandas-dev/pandas/blob/main/pandas/core/internals/managers.py#L1965), which stacks underlying NumPy buffers. While efficient for bulk operations, it cannot optimize repeated single-row calls.

## Efficient Methods to Add Row to DataFrame Pandas

### Collect Rows in a List and Construct Once

The most memory-efficient pattern avoids intermediate DataFrames entirely. Accumulate rows as dictionaries or Series in a Python list, then pass the entire list to the DataFrame constructor in a single operation.

```python
import pandas as pd

# Efficient: Single construction from list of dicts

rows = [{'a': i, 'b': i * 2} for i in range(1_000_000)]
df = pd.DataFrame(rows)

```

This approach allows pandas to build the `BlockManager` structure in one pass through the data, eliminating the copy overhead of incremental growth.

### Accumulate DataFrames and Concatenate Once

When processing data in batches (such as chunked file reads or streaming API responses), accumulate small DataFrames in a list and perform a single `pd.concat` operation at the end.

```python

# Efficient: Batch concatenation

chunks = []
for chunk in pd.read_csv('big_file.csv', chunksize=100_000):
    # Transformations applied per chunk

    chunks.append(chunk)
    
df = pd.concat(chunks, ignore_index=True)

```

The `concat` function in [`pandas/core/reshape/concat.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/reshape/concat.py) optimizes this by concatenating at the block level through `concatenate_managers`, reusing existing blocks where possible and minimizing memory copies compared to iterative appends.

### Pre-allocate and Fill with .loc

When the final DataFrame size is known beforehand, pre-allocate the structure with a placeholder index and fill rows using `.loc` assignment. This avoids `BlockManager` reconstruction because the underlying arrays are already sized correctly.

```python
import numpy as np

# Pre-allocate with target size

N = 1_000_000
df = pd.DataFrame(index=range(N), columns=['a', 'b'])

# Fill via .loc (O(1) per assignment when index exists)

for i in range(N):
    df.loc[i] = [i, i * 2]

```

This method directly writes into the underlying NumPy arrays without triggering the expensive reallocation path, though it requires knowing the final row count in advance.

### Use Vectorized Column Assignment

For adding many rows that share column-wise patterns, use `DataFrame.assign` or direct column assignment rather than row-wise iteration. This leverages pandas' vectorized operations implemented in C.

```python

# Vectorized column addition

N = 1_000_000
df = pd.DataFrame({'a': np.arange(N)})
df = df.assign(b=lambda x: x['a'] * 2)  # Single vectorized operation

```

## What to Avoid

**Never use `DataFrame.append`** in performance-critical code. This method is deprecated and scheduled for removal. Internally, it is merely a thin wrapper around `_append_internal` that forces a full copy of the entire DataFrame on every call, resulting in quadratic memory usage and execution time.

Similarly, avoid `df.loc[len(df)] = ...` without pre-allocation. While this pattern works for small data, it triggers the same `BlockManager` reconstruction as `append` when the index must be extended dynamically.

## Summary

- **Batch your operations**: Collect rows in lists or small DataFrames and construct/concatenate once rather than appending iteratively.
- **Pre-allocate when possible**: Create DataFrames with final dimensions known and fill via `.loc` to avoid `BlockManager` rebuilds.
- **Use vectorized operations**: Leverage `assign` and column-wise operations instead of row-wise loops.
- **Avoid deprecated methods**: Never use `DataFrame.append` in production code; it forces expensive full copies on every call.
- **Understand the internals**: Row appends in pandas require `BlockManager` reconstruction in [`pandas/core/internals/managers.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/internals/managers.py), making single-row operations inherently expensive for large datasets.

## Frequently Asked Questions

### Why is appending a single row to a large DataFrame so slow in pandas?

Appending a single row forces pandas to rebuild the entire `BlockManager` structure that stores column data as contiguous NumPy arrays. In [`pandas/core/internals/managers.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/internals/managers.py), the `concatenate_managers` function creates new block arrays by copying data, resulting in O(n) time and memory usage per append operation. When repeated in a loop, this creates quadratic complexity.

### What is the most memory-efficient way to add millions of rows to a DataFrame?

The most memory-efficient method is to accumulate rows as dictionaries or lists in a Python list, then create the DataFrame once using `pd.DataFrame(rows)`. This avoids allocating intermediate DataFrames entirely. If processing in batches, accumulate small DataFrames in a list and call `pd.concat(chunks, ignore_index=True)` once at the end, which concatenates at the block level without Python-level loops.

### Is `df.loc[len(df)] = row` faster than `DataFrame.append`?

Both methods trigger similar internal overhead when the index must grow dynamically, but `df.loc[len(df)] = row` can be faster if you pre-allocate the DataFrame with a fixed index range. When pre-allocated, `.loc` writes directly into existing NumPy arrays without rebuilding the `BlockManager`. However, without pre-allocation, both methods suffer from the same quadratic copying behavior, and `DataFrame.append` is deprecated and should be avoided entirely.

### How does `pd.concat` achieve better performance than iterative appends?

`pd.concat` in [`pandas/core/reshape/concat.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/reshape/concat.py) delegates to `concatenate_managers` in [`pandas/core/internals/managers.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/internals/managers.py), which stacks underlying NumPy buffers at the C level. By operating on blocks rather than individual rows, it minimizes Python interpreter overhead and memory allocations. When concatenating a list of DataFrames, pandas can reuse existing block structures and perform a single allocation for the result, rather than copying the entire dataset on every append operation.