How to Use pandas pivot to Reshape a DataFrame: A Complete Guide
The pandas.DataFrame.pivot method reshapes long-format data into wide format by converting unique values from one column into new column headers, leveraging pandas/core/reshape/pivot.py to validate inputs and unstack to produce the final matrix.
The pandas pivot function is the standard tool for transforming analytical datasets from transactional long-form to presentation-ready wide-form tables. According to the pandas-dev/pandas source code, this high-level convenience method is implemented in pandas/core/reshape/pivot.py and attaches to the DataFrame API via pandas/core/frame.py. It efficiently restructures data by combining index manipulation with lower-level unstacking operations.
How pandas pivot Works Internally
Understanding the internal mechanics helps you debug errors and optimize performance.
Input Validation and Uniqueness Constraints
The implementation first validates that your specified index, columns, and values arguments refer to existing DataFrame columns. It enforces a strict uniqueness guarantee: the combination of index and columns must uniquely identify each row. If the source data contains duplicate pairs, the function raises a ValueError immediately rather than producing ambiguous results.
MultiIndex Construction and Unstacking
After validation, pivot creates a temporary hierarchical MultiIndex by combining the selected index and columns. The method then calls DataFrame.unstack on the columns level to perform the actual reshape operation. This unstacking process transforms the long-format rows into a wide-format matrix where each unique value from your columns parameter becomes a new column header.
Basic Syntax and Parameters
The pivot method signature is straightforward:
DataFrame.pivot(index=None, columns=None, values=None)
- index: String or list specifying column(s) to use as row labels (optional).
- columns: String or list specifying column(s) whose unique values become new column headers.
- values: String or list specifying column(s) containing data to populate cells (optional; defaults to all remaining columns).
Practical Examples of pandas pivot
Simple Numeric Pivot
Convert daily sales data from long to wide format using unique date-city combinations:
import pandas as pd
df = pd.DataFrame({
"date": ["2023-01-01", "2023-01-01", "2023-01-02", "2023-01-02"],
"city": ["NY", "SF", "NY", "SF"],
"sales": [100, 200, 150, 250]
})
pivoted = df.pivot(index="date", columns="city", values="sales")
print(pivoted)
Output:
city NY SF
date
2023-01-01 100 200
2023-01-02 150 250
Pivoting Non-Numeric Data
The method handles categorical values equally well, such as mapping managers to regions:
df2 = pd.DataFrame({
"product": ["A", "A", "B", "B"],
"region": ["East", "West", "East", "West"],
"manager": ["John", "Jane", "Mike", "Anna"]
})
pivoted2 = df2.pivot(index="product", columns="region", values="manager")
print(pivoted2)
Output:
region East West
product
A John Jane
B Mike Anna
Handling Duplicate Keys with pivot_table
When your data contains duplicate index-column pairs, pivot fails intentionally. Use pivot_table with an aggregation function instead:
df3 = pd.DataFrame({
"date": ["2023-01-01", "2023-01-01"],
"city": ["NY", "NY"],
"sales": [100, 120]
})
# This raises ValueError due to duplicate (date, city) pairs
# pivoted = df3.pivot(index="date", columns="city", values="sales")
# Correct approach for duplicates:
pivoted3 = df3.pivot_table(index="date", columns="city", values="sales", aggfunc="sum")
print(pivoted3)
Output:
city NY
date
2023-01-01 220
When to Use pivot vs pivot_table
pivot is optimized for one-to-one mappings between index and column values. It skips aggregation logic entirely, making it faster and more memory-efficient for clean datasets.
pivot_table is required when duplicate entries exist. It aggregates multiple values into single cells using functions like mean, sum, or count, providing flexibility at the cost of computational overhead.
Summary
pandas.DataFrame.pivotis implemented inpandas/core/reshape/pivot.pyand delegates toDataFrame.unstackfor the final reshape operation.- The method enforces uniqueness: duplicate index-column combinations trigger a
ValueErrorto prevent data ambiguity. - For datasets with duplicate keys,
pivot_tableprovides essential aggregation capabilities thatpivotintentionally lacks. - The function creates a temporary MultiIndex internally before unstacking to generate the wide-format output matrix.
Frequently Asked Questions
What is the difference between pandas pivot and pivot_table?
pivot performs a strict reshape that fails if index-column combinations are not unique, while pivot_table aggregates duplicate entries using functions like mean or sum. Use pivot for one-to-one mappings and pivot_table when you need to summarize multiple values into single cells.
Where is the pandas pivot implementation located?
The core algorithm lives in pandas/core/reshape/pivot.py, where the function validates inputs, constructs a hierarchical MultiIndex, and calls unstack to produce the wide-format result. The method is exposed to users through pandas/core/frame.py as part of the DataFrame class interface.
Why does pandas pivot raise a ValueError?
The method raises ValueError: Index contains duplicate entries, cannot reshape when your specified index and columns parameters map to multiple rows in the source data. This protects data integrity by preventing ambiguous reshaping operations where one cell would need to contain multiple values.
Can pandas pivot handle multiple value columns simultaneously?
Yes. If you omit the values parameter or pass a list of column names, pivot reshapes all specified value columns at once. The resulting DataFrame features a MultiIndex column structure with the original value column names at the top level and the pivoted categories from your columns parameter as the second level.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s https://instagit.com/install.md