# How to Efficiently Use Apply in Pandas to Conditionally Update Specific Rows

> Learn to efficiently update pandas rows conditionally using vectorized boolean indexing and avoid slow apply iterations. Discover when apply is truly necessary.

- Repository: [pandas/pandas](https://github.com/pandas-dev/pandas)
- Tags: how-to-guide
- Published: 2026-02-21

---

**Use vectorized boolean indexing (`df.loc[mask, col] = value`) instead of `apply` for conditional updates, and only fall back to `apply` with `raw=True` or `engine="numba"` when row-wise logic is unavoidable.**

When working with the `pandas-dev/pandas` repository, understanding how to efficiently use apply in pandas to conditionally update specific rows can mean the difference between sub-second execution and minutes of processing. While `DataFrame.apply` offers flexibility, its implementation in [`pandas/core/apply.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/apply.py) introduces significant Python overhead that vectorized operations avoid entirely.

## Why Row-Wise Apply Is Slow: Inside pandas/core/apply.py

When you call `df.apply(func, axis=1)`, pandas instantiates a **FrameRowApply** object (defined in [`pandas/core/apply.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/apply.py)). This object iterates over each column-row pair via a series generator, producing individual `Series` objects one-by-one and invoking your function on each row【/cache/repos/github.com/pandas-dev/pandas/main/pandas/core/apply.py#L887-L894】.

Because this loop executes in pure Python, every row incurs function call overhead and `Series` object allocation. For large DataFrames, this creates a significant performance bottleneck compared to C-level vectorized operations.

## The Vectorized Solution: Boolean Indexing vs. Apply

For conditional updates based on column values, vectorized boolean indexing operates directly on the underlying NumPy arrays stored in [`pandas/core/frame.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py), bypassing Python iteration entirely.

### Using df.loc for Conditional Updates

The most efficient pattern uses `df.loc` with a boolean mask:

```python
import pandas as pd

df = pd.DataFrame({
    "category": ["full", "discount", "full", "discount"],
    "price": [100, 200, 150, 250]
})

# Set price to 0 where category is "discount"

mask = df["category"] == "discount"
df.loc[mask, "price"] = 0

```

This translates to a single NumPy masked assignment in [`pandas/core/ops.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/ops.py), executing in compiled C loops without per-row Python overhead.

### Using np.where and Series.where for Complex Logic

When updates require conditional logic beyond simple assignment, use `np.where` or `Series.where`:

```python
import numpy as np

# Increase salary by 10% for senior staff, keep original otherwise

df["salary"] = np.where(
    df["level"] == "senior",
    df["salary"] * 1.10,
    df["salary"]
)

```

Alternatively, `Series.where` updates values where the condition is **False** (opposite of `np.where`):

```python

# Set price to 0 only where category is NOT "discount"

df["price"] = df["price"].where(df["category"] != "discount", 0)

```

## When You Must Use Apply: Optimizing with raw=True and engine='numba'

Only fall back to `apply` when transformation logic requires access to the entire row in a way that cannot be vectorized (e.g., complex string manipulation across multiple columns). When unavoidable, optimize using fast-paths defined in [`pandas/core/apply.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/apply.py).

### Using raw=True to Avoid Series Overhead

The `raw=True` parameter passes NumPy `ndarray` objects instead of `Series`, eliminating object allocation overhead【/cache/repos/github.com/pandas-dev/pandas/main/pandas/core/apply.py#L1247-L1258】:

```python
def compute_discount(row):
    # row is a 1-D NumPy array: [category, price]

    cat, price = row
    return price * 0.8 if cat == "discount" else price

df["new_price"] = df.apply(compute_discount, axis=1, raw=True)

```

### JIT Compilation with engine='numba'

For maximum performance, use `engine="numba"` to JIT-compile the row function, eliminating Python overhead entirely (requires `numba` package):

```python

# pip install numba

def compute_numba(row):
    cat, price = row
    return price * 0.8 if cat == "discount" else price

df["new_price"] = df.apply(compute_numba, axis=1, engine="numba")

```

This compiles the function once and executes it across the entire data block with near-C speed.

## Summary

- **Avoid `apply` for conditional updates**: Use `df.loc[mask, col] = value` or `np.where` for vectorized operations that execute in C-level loops.
- **Use `apply` only when necessary**: Fall back to row-wise operations only when logic requires access to multiple columns in a non-vectorizable way.
- **Optimize `apply` with `raw=True`**: Pass NumPy arrays instead of Series objects to eliminate Python overhead.
- **Use `engine="numba"` for JIT compilation**: Compile row functions to machine code for maximum performance when `apply` is unavoidable.

## Frequently Asked Questions

### Is DataFrame.apply always slow compared to vectorized operations?

Yes. According to the [`pandas/core/apply.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/apply.py) implementation, `DataFrame.apply` with `axis=1` creates a Python generator that yields individual `Series` objects for each row, invoking your function in a pure Python loop. This overhead makes it orders of magnitude slower than vectorized operations that execute in compiled NumPy C code.

### When should I use apply instead of df.loc or np.where?

Use `apply` only when the transformation logic requires access to the entire row in a way that cannot be expressed through column-wise vectorized operations. Examples include complex string concatenation across multiple columns, conditional logic that depends on three or more columns with non-arithmetic relationships, or operations requiring external API calls per row. For simple conditional updates based on one or two columns, `df.loc` or `np.where` remain superior.

### Does raw=True make apply as fast as vectorized operations?

No, `raw=True` reduces overhead by passing NumPy arrays instead of Series objects, but it does not eliminate the Python function call overhead per row. As implemented in [`pandas/core/apply.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/apply.py), `raw=True` still iterates through rows in Python, making it slower than true vectorized operations. However, it is significantly faster than the default `raw=False` mode, especially for large DataFrames.

### How does engine="numba" improve apply performance?

The `engine="numba"` parameter triggers JIT (Just-In-Time) compilation of your row function using the Numba library. According to the [`pandas/core/apply.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/apply.py) source, this compiles the Python function to machine code, eliminating per-row Python interpreter overhead and executing the logic at near-C speed across the entire data block. This provides the fastest possible `apply` performance, though it requires the `numba` package to be installed and may have limitations with certain Python constructs inside the function.