# How to Apply a Pandas Lambda Function to a DataFrame: Optimization Guide

> Optimize your pandas lambda function application. Use DataFrame apply raw=True with numba engine for compiled performance on large datasets.

- Repository: [pandas/pandas](https://github.com/pandas-dev/pandas)
- Tags: optimization-guide
- Published: 2026-02-15

---

**Use `DataFrame.apply()` with `raw=True` for numeric data to receive NumPy arrays instead of Series objects, and specify `engine='numba'` for compiled performance on large datasets.**

The pandas library provides a robust mechanism for executing custom logic across DataFrame structures through the `apply` method. According to the pandas-dev/pandas source code, this functionality is implemented in [`pandas/core/frame.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py) and orchestrated through a sophisticated execution engine in [`pandas/core/apply.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/apply.py) that optimizes how lambda functions interact with underlying data blocks.

## Understanding DataFrame.apply Architecture

The primary entry point for applying a pandas lambda function to a DataFrame is `DataFrame.apply()`, defined at line 13940 in [`pandas/core/frame.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py). When invoked, the method delegates to the `Apply` class hierarchy in [`pandas/core/apply.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/apply.py), which analyzes the callable and selects an execution strategy based on the `axis`, `engine`, and `raw` parameters.

The internal workflow follows four distinct stages:

1. **Validate the callable** – pandas checks that the lambda is indeed callable using `callable(func)`.
2. **Select the execution engine** – `apply` delegates to `_get_axis` → `apply_standard`, `apply_broadcast`, or the Numba-based `apply_with_numba`.
3. **Iterate over the chosen axis** – each slice is passed to the lambda. If `raw=True`, the slice is an `ndarray`; otherwise it is a `Series`.
4. **Collect results** – pandas assembles a new `DataFrame` or `Series` based on `result_type`.

## Execution Paths and Performance Optimization

### The raw Parameter: NumPy Arrays vs Series

By default, `apply` passes each column or row as a pandas Series to your lambda, which preserves index alignment and metadata. Setting `raw=True` passes a NumPy `ndarray` instead, eliminating the overhead of Series object construction and dtype checking. This is implemented in the `apply_standard` path within [`pandas/core/apply.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/apply.py).

Use `raw=True` when your lambda performs purely numeric operations that do not require pandas indexing methods.

### The engine Parameter: Python vs Numba

The `engine` parameter accepts `'python'` (default) or `'numba'`. When `engine='numba'` is specified and the lambda contains NumPy-compatible operations, pandas compiles the function using Numba's JIT compiler. This path is handled by `apply_with_numba` in the apply machinery, providing significant speedups on large numeric DataFrames.

Note that the Numba engine requires numeric data types and cannot compile operations involving Python objects or strings.

### Controlling Output Shape with result_type

The `result_type` parameter guides how pandas assembles the return value when `axis=1`. Options include:

- **`'expand'`**: List-like results become columns in a new DataFrame.
- **`'reduce'`**: Forces a Series return even if list-like results are produced.
- **`'broadcast'`**: Broadcasts scalar results to the original shape.

This logic resides in the `ApplyResult` class in [`pandas/core/apply.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/apply.py).

### Axis Selection: Columns vs Rows

The `axis` parameter determines iteration direction. `axis=0` (default) applies the function to each column, which is marginally faster because pandas can leverage block-wise operations on the underlying `BlockManager`. `axis=1` applies the function row-wise, requiring more complex alignment but necessary for row-based calculations.

## Practical Code Examples

```python
import pandas as pd
import numpy as np

# Sample DataFrame

df = pd.DataFrame({
    "a": np.arange(5),
    "b": np.arange(5, 10)
})

# 1️⃣ Simple element-wise lambda on each column (axis=0)

#    Faster with raw=True – receives numpy array per column.

col_sum = df.apply(lambda col: col.sum(), raw=True)
print(col_sum)

# a    10

# b    35

# dtype: int64

# 2️⃣ Row-wise operation (axis=1) to combine values

#    Returns a Series where each row gets a custom string.

row_label = df.apply(lambda row: f"r{row['a']}_{row['b']}", axis=1)
print(row_label)

# 0    r0_5

# 1    r1_6

# 2    r2_7

# 3    r3_8

# 4    r4_9

# dtype: object

# 3️⃣ Using result_type='expand' to turn a tuple-returning lambda into a DataFrame

expanded = df.apply(
    lambda row: (row['a'] * 2, row['b'] * 3),
    axis=1,
    result_type="expand"
)
expanded.columns = ["a_twice", "b_triple"]
print(expanded)

#    a_twice  b_triple

# 0        0        15

# 1        2        18

# 2        4        21

# 3        6        24

# 4        8        27

# 4️⃣ Leveraging the numba engine for numeric lambdas (if numba is installed)

#    This compiles the lambda, yielding a noticeable speedup on large frames.

df_large = pd.DataFrame(np.random.rand(10000, 4), columns=list("ABCD"))
numeric_result = df_large.apply(
    lambda col: np.mean(col ** 2),
    engine="numba"
)
print(numeric_result)

```

## Summary

- Use **`raw=True`** for numeric lambdas to avoid Series overhead and receive NumPy arrays directly.
- Specify **`engine='numba'`** for compiled performance on large numeric datasets, handled by `apply_with_numba` in [`pandas/core/apply.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/apply.py).
- Prefer **`axis=0`** (column-wise) when possible to leverage block-wise operations on the underlying `BlockManager`.
- Use **`result_type`** to control output shape without manual reshaping, particularly when returning tuples or lists from row-wise operations.
- For simple element-wise operations, prefer vectorized pandas methods or NumPy ufuncs over `apply` to avoid iteration overhead entirely.

## Frequently Asked Questions

### Is DataFrame.apply faster than a Python for loop?

Yes. While `apply` iterates over rows or columns internally, it executes within compiled Cython code in [`pandas/core/apply.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/apply.py), avoiding the Python interpreter overhead that slows explicit `for` loops. However, vectorized operations remain significantly faster than `apply` for simple calculations.

### When should I use raw=True with a lambda?

Use `raw=True` when your lambda performs purely numeric operations that do not require pandas Series methods or index alignment. This parameter passes NumPy `ndarray` objects directly to your function, reducing memory overhead and execution time as implemented in the standard apply path of [`pandas/core/apply.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/apply.py).

### Can I use the numba engine with string data?

No. The `engine='numba'` option in `DataFrame.apply` requires Numba-compatible numeric types. String operations and Python object manipulations cannot be JIT-compiled by Numba and will fall back to the Python engine automatically.

### What is the difference between apply and applymap?

`DataFrame.apply` processes entire rows or columns (1-D slices) and is defined in [`pandas/core/frame.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py). `DataFrame.applymap` (now deprecated in favor of `map`) operates element-wise on each individual scalar value. Use `apply` for row/column-wise logic and `map` for element-wise transformations.