# How to Read Excel Files Faster in Pandas: Optimizing read_excel Performance

> Boost pandas read_excel performance with calamine engine, usecols, nrows, and read_only mode. Read large Excel files faster. Learn optimization techniques now.

- Repository: [pandas/pandas](https://github.com/pandas-dev/pandas)
- Tags: performance
- Published: 2026-02-15

---

**Use the `calamine` engine with `python-calamine` installed, limit data with `usecols` and `nrows`, and enable `read_only` mode in `engine_kwargs` to significantly speed up `pandas.read_excel` for large workbooks.**

The `pandas.read_excel` function in the pandas-dev/pandas repository provides a convenient interface for loading Excel files into DataFrames, but default settings often use pure-Python engines that bottleneck performance on large workbooks. Understanding how to leverage faster engines and optimization parameters allows you to reduce read excel pandas execution time from minutes to seconds.

## How pandas.read_excel Works Under the Hood

Under the hood, `read_excel` is a thin wrapper defined in [`pandas/io/excel/_base.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/excel/_base.py) (lines 64-84) that instantiates an **`ExcelFile`** object and delegates parsing to engine-specific reader classes. The function automatically selects an engine based on the file extension—typically defaulting to `openpyxl` for `.xlsx` files or `xlrd` for legacy `.xls` files. These default engines are pure Python implementations that can struggle with large datasets, whereas alternative engines like `calamine` or `pyxlsb` leverage compiled backends to minimize I/O overhead.

## Faster Ways to Read Excel in Pandas

### Use the Calamine Engine for C++-Backed Performance

The fastest way to read excel pandas workloads is often the **`calamine`** engine, which calls into the compiled `python-calamine` library (C++ backend). Implemented in [`pandas/io/excel/_calamine.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/excel/_calamine.py) (lines 20-90), this engine reads `.xls`, `.xlsx`, `.xlsm`, `.xlsb`, and `.ods` files significantly faster than pure-Python alternatives.

Install the optional dependency and force the engine:

```python
import pandas as pd

# pip install python-calamine

df = pd.read_excel(
    "large_file.xlsx",
    engine="calamine",
    usecols="A:D",      # Only columns A through D

    nrows=5000,         # First 5,000 rows only

)
print(df.shape)  # (5000, 4)

```

### Limit Data Scope with usecols and nrows

Regardless of engine choice, reduce parsing overhead by reading only the data you need. The **`usecols`** parameter accepts column letters, indices, or callable functions, while **`nrows`** restricts the number of rows parsed. This prevents the engine from processing entire worksheets when you only need a subset.

```python
import pandas as pd

# Read specific columns by index and limit rows

df = pd.read_excel(
    "data.xlsx",
    usecols=[0, 2, 4],  # First, third, and fifth columns

    nrows=1000,
)

```

### Enable Read-Only Mode for Streaming

When using the `openpyxl` engine (the default for modern `.xlsx` files), enable **`read_only`** mode via `engine_kwargs` to open files in a streaming, low-memory configuration. This is implemented in [`pandas/io/excel/_openpyxl.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/excel/_openpyxl.py) and significantly reduces overhead for very large spreadsheets where you only need to iterate once. The `engine_kwargs` parameter is forwarded to the underlying engine as shown in [`pandas/io/excel/_base.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/excel/_base.py) (lines 197-210).

```python
import pandas as pd

df = pd.read_excel(
    "large_file.xlsx",
    engine="openpyxl",
    engine_kwargs={"read_only": True, "data_only": True},
    usecols=[0, 2, 4],
)

```

### Prefer Binary Engines for XLSB Files

For files saved in the binary Excel format (`.xlsb`), avoid conversion overhead by using the **`pyxlsb`** engine. This reads the binary format directly rather than parsing XML, offering substantial speed improvements for large binary workbooks.

```python
import pandas as pd

df = pd.read_excel(
    "big_file.xlsb",
    engine="pyxlsb",
    usecols="A:C",
)

```

## Summary

- **`pandas.read_excel`** delegates to engine-specific readers defined in [`pandas/io/excel/_base.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/excel/_base.py), with default pure-Python engines often creating performance bottlenecks.
- The **`calamine`** engine (from [`pandas/io/excel/_calamine.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/excel/_calamine.py)) provides the fastest read excel pandas performance by leveraging a C++ backend.
- Reduce I/O by specifying **`usecols`** and **`nrows`** to parse only required data subsets.
- Enable **`read_only`** mode in `engine_kwargs` when using `openpyxl` to stream large files with lower memory overhead.
- Use **`pyxlsb`** for binary `.xlsb` files to avoid XML parsing overhead.

## Frequently Asked Questions

### What is the fastest engine for read_excel in pandas?

The **Calamine** engine is currently the fastest option for most Excel formats. Implemented in [`pandas/io/excel/_calamine.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/excel/_calamine.py), it uses the `python-calamine` library with a C++ backend to read `.xlsx`, `.xls`, `.xlsm`, `.xlsb`, and `.ods` files significantly faster than pure-Python alternatives like `openpyxl` or `xlrd`.

### How do I install the calamine engine for pandas?

Calamine is an optional dependency. Install it using pip with `pip install python-calamine`. Once installed, specify `engine="calamine"` in your `pd.read_excel()` call. Pandas will automatically use the `CalamineReader` class defined in [`pandas/io/excel/_calamine.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/excel/_calamine.py) to parse the workbook.

### Can I read only specific columns with read_excel?

Yes. Use the **`usecols`** parameter to limit which columns are parsed. You can pass column letters (e.g., `"A:D"`), indices (e.g., `[0, 2, 4]`), or a callable function. This prevents the engine from processing unnecessary data, significantly reducing memory usage and parsing time for large files.

### What is the difference between openpyxl and calamine engines?

**Openpyxl** is the default pure-Python engine for modern `.xlsx` files, implemented in [`pandas/io/excel/_openpyxl.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/excel/_openpyxl.py). It offers features like formula evaluation and write support but can be slow with large datasets. **Calamine** is a Rust/C++ backed engine (via `python-calamine`) implemented in [`pandas/io/excel/_calamine.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/excel/_calamine.py) that prioritizes read performance and memory efficiency but is read-only. Choose Calamine for speed, Openpyxl for compatibility and write operations.