How to Use pandas read excel to Load .xlsx Files in Python

Use pandas.read_excel() with a file path and optional sheet_name parameter to import Excel workbooks into a DataFrame, automatically selecting the appropriate parsing engine based on the file extension.

The pandas.read_excel function serves as the primary entry point for loading Excel data into Python. Located within the pandas.io.excel package in the pandas-dev/pandas repository, this function abstracts the complexity of handling multiple Excel formats while providing a consistent interface for reading .xlsx, .xls, and other spreadsheet formats.

How pandas read excel Works Internally

When you call pd.read_excel(), the function executes a multi-stage pipeline that delegates parsing to specialized backend libraries.

Engine Dispatch and ExcelFile Creation

The dispatch logic resides in pandas/io/excel/_base.py at line 91. Here, read_excel determines which engine to use—such as openpyxl, xlrd, or pyxlsb—based on the file extension and the optional engine parameter.

Immediately following dispatch, the function instantiates an ExcelFile object (defined at line 128 in _base.py). This class acts as a high-level wrapper around the low-level parser, managing file handles and providing a uniform interface across different Excel formats.

Sheet Selection and Parsing

The sheet_name parameter (defaulting to 0, the first sheet) specifies which worksheets to extract. The ExcelFile.parse method forwards this request to the engine-specific parser. For modern .xlsx files, the default engine is openpyxl, with its implementation located in pandas/io/excel/_openpyxl.py. This module converts cell values and formatting into a two-dimensional array, applies dtype inference, and constructs the final DataFrame.

Basic Usage Examples

Reading the First Sheet

By default, read_excel loads the first worksheet:

import pandas as pd

df = pd.read_excel("data/example.xlsx")
print(df.head())

Reading a Specific Sheet by Name

Use the sheet_name parameter to target a specific worksheet:

df = pd.read_excel("data/example.xlsx", sheet_name="Sales")

Advanced pandas read excel Options

Loading Multiple Sheets

Pass a list of sheet names to receive a dictionary of DataFrames:

sheets = pd.read_excel(
    "data/example.xlsx",
    sheet_name=["Sales", "Inventory"],
    engine="openpyxl"  # optional – explicitly sets the engine

)

sales_df = sheets["Sales"]
inventory_df = sheets["Inventory"]

Skipping Rows and Limiting Columns

Control which data becomes your header and which columns to import:

df = pd.read_excel(
    "data/example.xlsx",
    skiprows=2,       # ignore the first two rows

    header=0,         # third row becomes column names

    usecols="A:D"     # read only columns A through D

)

Specifying Data Types

Prevent pandas from inferring types by providing a dtype dictionary:

df = pd.read_excel(
    "data/example.xlsx",
    usecols=["Date", "Revenue"],
    dtype={"Revenue": "float64", "Date": "str"}
)

Key Source Files in pandas-dev/pandas

Understanding the implementation details helps when debugging or extending functionality:

  • pandas/io/excel/_base.py – Contains the read_excel wrapper (line 91), engine dispatch logic, and the ExcelFile class (line 128) that manages parser abstraction.

  • pandas/io/excel/_openpyxl.py – Implements the openpyxl engine used by default for .xlsx files, handling cell value extraction and data conversion.

  • pandas/io/excel/_xlsx.py – Provides low-level utilities shared by Excel engines, including cell formatting and datatype handling helpers.

Summary

  • pandas.read_excel is the high-level interface for importing Excel files, located in pandas/io/excel/_base.py.
  • The function automatically selects an appropriate engine (typically openpyxl for .xlsx) based on file extension.
  • Use sheet_name to target specific worksheets, or pass a list to load multiple sheets into a dictionary.
  • Control data ingestion with skiprows, usecols, header, and dtype parameters to optimize memory and parsing speed.

Frequently Asked Questions

What is the default engine for reading .xlsx files in pandas?

When you call pandas.read_excel on a .xlsx file without specifying an engine, pandas automatically selects openpyxl (if installed). This default is determined by the engine dispatch logic in pandas/io/excel/_base.py, which maps file extensions to their preferred parsing libraries.

How do I read multiple sheets from an Excel file?

Pass a list of sheet identifiers to the sheet_name parameter. pandas.read_excel returns a dictionary where keys are sheet names and values are DataFrames. For example: pd.read_excel("file.xlsx", sheet_name=["Sheet1", "Sheet2"]).

Can I specify data types when using pandas read excel?

Yes. Use the dtype parameter to provide a dictionary mapping column names to data types. This prevents pandas from inferring types and can reduce memory usage. For example: dtype={"Revenue": "float64", "ID": "int32"}.

Where is the read_excel function defined in the pandas source code?

The read_excel function is defined in pandas/io/excel/_base.py starting at line 91. This file also contains the ExcelFile class (line 128) and the engine dispatch logic that determines which backend library to use for parsing.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →