How to Read an Excel File in Python Using pandas: Complete Guide

Use pandas.read_excel() to load Excel workbooks into DataFrames, specifying the file path and optional parameters like sheet_name, engine, and dtype to control parsing behavior.

The pandas library provides the definitive solution to read an excel file in python through its high-level read_excel() function. Located in the pandas-dev/pandas repository, this utility parses both legacy .xls and modern .xlsx formats while dispatching to specialized engines based on file extensions. Whether analyzing sales data or processing financial reports, understanding how to read an excel file in python with pandas is essential for data workflows.

Understanding the pandas.read_excel() Architecture

The core implementation resides in pandas/io/excel/_base.py at lines 91‑165, where the read_excel() function acts as a thin wrapper around engine-specific parsers. When you read an excel file in python using this function, pandas performs dynamic dispatch to concrete implementations based on the file extension or the explicit engine parameter.

The architecture separates concerns into modular engine classes:

Each engine implements an ExcelFile class with standardized parse() methods, ensuring consistent behavior regardless of the underlying format when you read an excel file in python.

Basic Syntax to Read an Excel File in Python

The simplest invocation requires only the file path. By default, read_excel() loads the first worksheet into a DataFrame:

import pandas as pd

# Read the first sheet of an Excel workbook

df = pd.read_excel("sales_data.xlsx")
print(df.head())

This command automatically detects the file format and selects the appropriate engine. For modern .xlsx files, pandas defaults to openpyxl; for legacy .xls files, it uses xlrd (provided the respective libraries are installed).

Selecting Sheets When You Read an Excel File in Python

The sheet_name parameter provides flexible control over which worksheets to load. You can specify sheets by string name, zero-based integer index, or retrieve multiple sheets simultaneously:


# Load a specific sheet by name

df_q1 = pd.read_excel("sales_data.xlsx", sheet_name="Q1")

# Load a sheet by index (0 = first sheet)

df_second = pd.read_excel("sales_data.xlsx", sheet_name=1)

# Load multiple sheets into a dictionary {sheet_name: DataFrame}

sheets_dict = pd.read_excel(
    "sales_data.xlsx", 
    sheet_name=["Q1", "Q2", "Q3"]
)

When you pass a list to sheet_name, the function returns a dictionary mapping sheet identifiers to DataFrame objects. Passing sheet_name=None loads all sheets into a dictionary automatically.

Engine Selection and File Format Support

While pandas automatically selects engines based on file extensions, you can explicitly control the parser via the engine parameter. This is useful when working with unusual file types or requiring specific engine features:


# Force the openpyxl engine for .xlsx files

df = pd.read_excel("data.xlsx", engine="openpyxl")

# Explicitly use xlrd for older .xls files

legacy_df = pd.read_excel("archive.xls", engine="xlrd")

# Parse OpenDocument Spreadsheet

df_ods = pd.read_excel("data.ods", engine="odf")

# Read binary Excel format

df_binary = pd.read_excel("data.xlsb", engine="pyxlsb")

The engine implementations in pandas/io/excel/_openpyxl.py and pandas/io/excel/_xlrd.py handle the low-level parsing logic, while pandas/io/excel/_base.py coordinates the high-level API.

Advanced Data Parsing Options

When you read an excel file in python for production workflows, controlling data types and missing value handling is critical. The read_excel() function provides several precision parameters:

Column Data Types

Use the dtype parameter to define column types explicitly, reducing memory usage and preventing incorrect type inference:

df = pd.read_excel(
    "sales_data.xlsx",
    dtype={"Region": "category", "Sales": "float32", "Units": "Int16"}
)

Date Parsing

The parse_dates parameter converts specified columns to datetime objects during loading:


# Parse single column

df = pd.read_excel("data.xlsx", parse_dates=["OrderDate"])

# Parse combined columns (e.g., year, month, day)

df = pd.read_excel("data.xlsx", parse_dates={"Date": ["Year", "Month", "Day"]})

Custom Converters

Apply transformation functions during parsing with converters:

df = pd.read_excel(
    "data.xlsx",
    converters={"ID": lambda x: str(x).zfill(8), "Status": str.upper}
)

Missing Value Handling

Control how empty cells and sentinel values are interpreted using na_values and keep_default_na:

df = pd.read_excel(
    "data.xlsx",
    na_values=["N/A", "--", "NULL", "MISSING"],
    keep_default_na=True  # Also recognize standard empty cells

)

Complete Code Examples to Read an Excel File in Python

The following examples demonstrate end-to-end workflows using the pandas.read_excel() implementation from pandas/io/excel/_base.py:

Example 1: Single Sheet with Type Optimization

import pandas as pd

# Load first sheet with explicit data types

df = pd.read_excel(
    "financial_report.xlsx",
    dtype={
        "Account": "string",
        "Department": "category",
        "Amount": "float64"
    },
    parse_dates=["TransactionDate"]
)

print(f"Loaded {len(df)} rows")
print(df.dtypes)

Example 2: Multi-Sheet Processing


# Load specific sheets into a dictionary

sheets = pd.read_excel(
    "sales_data.xlsx",
    sheet_name=["Q1", "Q2", "Q3"],
    engine="openpyxl"
)

# Process each sheet

for quarter, data in sheets.items():
    revenue = data["Revenue"].sum()
    print(f"{quarter} Total Revenue: ${revenue:,.2f}")

Example 3: Loading All Sheets


# Load every sheet in the workbook

all_sheets = pd.read_excel("inventory.xlsx", sheet_name=None)

print(f"Loaded {len(all_sheets)} sheets: {list(all_sheets.keys())}")

# Access individual DataFrames

warehouse_df = all_sheets["Warehouse"]
retail_df = all_sheets["Retail"]

Example 4: Legacy Format Handling


# Explicitly use xlrd for older .xls files

legacy_data = pd.read_excel(
    "archive_2005.xls",
    engine="xlrd",
    sheet_name=0,
    na_values=["", "NA", "N/A"],
    converters={"EmployeeID": str}
)

Summary

  • pandas.read_excel() in pandas/io/excel/_base.py provides the primary interface to read an excel file in python, dispatching to specialized engines based on file extensions.
  • The function supports multiple worksheet selection via the sheet_name parameter, returning either a single DataFrame or a dictionary of DataFrames.
  • Engine selection happens automatically but can be forced via the engine parameter, with openpyxl handling modern .xlsx files and xlrd managing legacy .xls formats.
  • Advanced parsing controls include dtype for column types, parse_dates for datetime conversion, and na_values for custom missing value handling.
  • All engine implementations in pandas/io/excel/_xlrd.py, pandas/io/excel/_openpyxl.py, and related files inherit from the base classes defined in pandas/io/excel/_base.py, ensuring consistent API behavior.

Frequently Asked Questions

What is the difference between the xlrd and openpyxl engines when I read an excel file in python?

The xlrd engine, defined in pandas/io/excel/_xlrd.py, parses legacy Excel binary format (.xls) files and does not support modern .xlsx formats. The openpyxl engine, implemented in pandas/io/excel/_openpyxl.py, handles Office Open XML format (.xlsx) files and provides comprehensive support for modern Excel features. Pandas automatically selects the appropriate engine based on file extensions, but you can override this selection using the engine parameter in read_excel().

How do I read all sheets from an Excel file simultaneously?

Pass sheet_name=None to the read_excel() function to load every worksheet into a dictionary where keys are sheet names and values are DataFrame objects. Alternatively, pass a list of sheet names or indices (e.g., sheet_name=["Sheet1", "Sheet2"]) to load specific subsets. When loading multiple sheets, the function returns a dictionary rather than a single DataFrame, allowing you to process each worksheet individually while maintaining the context of the original workbook structure.

Can I specify data types for columns when reading Excel files to prevent incorrect type inference?

Yes, use the dtype parameter to explicitly define column data types when calling read_excel(). Pass a dictionary mapping column names to pandas or NumPy types, such as dtype={"Region": "category", "Revenue": "float32", "Units": "Int16"}. This prevents pandas from automatically inferring types, which is particularly important for columns containing mixed data types, ID codes that should remain strings, or categorical variables where you want to optimize memory usage.

What engine should I use for binary Excel (.xlsb) files?

For binary Excel format (.xlsb) files, specify engine="pyxlsb" when calling read_excel(). The pyxlsb engine, implemented in pandas/io/excel/_pyxlsb.py, is specifically designed to parse Excel's binary format without requiring the file to be converted to XML. Note that you must install the pyxlsb package separately, as it is not included in the default pandas installation. This engine is particularly useful for handling large binary Excel files that would consume excessive memory if converted to standard formats.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s https://instagit.com/install.md

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client