How to Use pandas read excel to Load .xlsx Files in Python
Use pandas.read_excel() with a file path and optional sheet_name parameter to import Excel workbooks into a DataFrame, automatically selecting the appropriate parsing engine based on the file extension.
The pandas.read_excel function serves as the primary entry point for loading Excel data into Python. Located within the pandas.io.excel package in the pandas-dev/pandas repository, this function abstracts the complexity of handling multiple Excel formats while providing a consistent interface for reading .xlsx, .xls, and other spreadsheet formats.
How pandas read excel Works Internally
When you call pd.read_excel(), the function executes a multi-stage pipeline that delegates parsing to specialized backend libraries.
Engine Dispatch and ExcelFile Creation
The dispatch logic resides in pandas/io/excel/_base.py at line 91. Here, read_excel determines which engine to use—such as openpyxl, xlrd, or pyxlsb—based on the file extension and the optional engine parameter.
Immediately following dispatch, the function instantiates an ExcelFile object (defined at line 128 in _base.py). This class acts as a high-level wrapper around the low-level parser, managing file handles and providing a uniform interface across different Excel formats.
Sheet Selection and Parsing
The sheet_name parameter (defaulting to 0, the first sheet) specifies which worksheets to extract. The ExcelFile.parse method forwards this request to the engine-specific parser. For modern .xlsx files, the default engine is openpyxl, with its implementation located in pandas/io/excel/_openpyxl.py. This module converts cell values and formatting into a two-dimensional array, applies dtype inference, and constructs the final DataFrame.
Basic Usage Examples
Reading the First Sheet
By default, read_excel loads the first worksheet:
import pandas as pd
df = pd.read_excel("data/example.xlsx")
print(df.head())
Reading a Specific Sheet by Name
Use the sheet_name parameter to target a specific worksheet:
df = pd.read_excel("data/example.xlsx", sheet_name="Sales")
Advanced pandas read excel Options
Loading Multiple Sheets
Pass a list of sheet names to receive a dictionary of DataFrames:
sheets = pd.read_excel(
"data/example.xlsx",
sheet_name=["Sales", "Inventory"],
engine="openpyxl" # optional – explicitly sets the engine
)
sales_df = sheets["Sales"]
inventory_df = sheets["Inventory"]
Skipping Rows and Limiting Columns
Control which data becomes your header and which columns to import:
df = pd.read_excel(
"data/example.xlsx",
skiprows=2, # ignore the first two rows
header=0, # third row becomes column names
usecols="A:D" # read only columns A through D
)
Specifying Data Types
Prevent pandas from inferring types by providing a dtype dictionary:
df = pd.read_excel(
"data/example.xlsx",
usecols=["Date", "Revenue"],
dtype={"Revenue": "float64", "Date": "str"}
)
Key Source Files in pandas-dev/pandas
Understanding the implementation details helps when debugging or extending functionality:
-
pandas/io/excel/_base.py– Contains theread_excelwrapper (line 91), engine dispatch logic, and theExcelFileclass (line 128) that manages parser abstraction. -
pandas/io/excel/_openpyxl.py– Implements the openpyxl engine used by default for.xlsxfiles, handling cell value extraction and data conversion. -
pandas/io/excel/_xlsx.py– Provides low-level utilities shared by Excel engines, including cell formatting and datatype handling helpers.
Summary
pandas.read_excelis the high-level interface for importing Excel files, located inpandas/io/excel/_base.py.- The function automatically selects an appropriate engine (typically openpyxl for
.xlsx) based on file extension. - Use
sheet_nameto target specific worksheets, or pass a list to load multiple sheets into a dictionary. - Control data ingestion with
skiprows,usecols,header, anddtypeparameters to optimize memory and parsing speed.
Frequently Asked Questions
What is the default engine for reading .xlsx files in pandas?
When you call pandas.read_excel on a .xlsx file without specifying an engine, pandas automatically selects openpyxl (if installed). This default is determined by the engine dispatch logic in pandas/io/excel/_base.py, which maps file extensions to their preferred parsing libraries.
How do I read multiple sheets from an Excel file?
Pass a list of sheet identifiers to the sheet_name parameter. pandas.read_excel returns a dictionary where keys are sheet names and values are DataFrames. For example: pd.read_excel("file.xlsx", sheet_name=["Sheet1", "Sheet2"]).
Can I specify data types when using pandas read excel?
Yes. Use the dtype parameter to provide a dictionary mapping column names to data types. This prevents pandas from inferring types and can reduce memory usage. For example: dtype={"Revenue": "float64", "ID": "int32"}.
Where is the read_excel function defined in the pandas source code?
The read_excel function is defined in pandas/io/excel/_base.py starting at line 91. This file also contains the ExcelFile class (line 128) and the engine dispatch logic that determines which backend library to use for parsing.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →