How to Use pandas pivot to Reshape a DataFrame: A Complete Guide

The pandas.DataFrame.pivot method reshapes long-format data into wide format by converting unique values from one column into new column headers, leveraging pandas/core/reshape/pivot.py to validate inputs and unstack to produce the final matrix.

The pandas pivot function is the standard tool for transforming analytical datasets from transactional long-form to presentation-ready wide-form tables. According to the pandas-dev/pandas source code, this high-level convenience method is implemented in pandas/core/reshape/pivot.py and attaches to the DataFrame API via pandas/core/frame.py. It efficiently restructures data by combining index manipulation with lower-level unstacking operations.

How pandas pivot Works Internally

Understanding the internal mechanics helps you debug errors and optimize performance.

Input Validation and Uniqueness Constraints

The implementation first validates that your specified index, columns, and values arguments refer to existing DataFrame columns. It enforces a strict uniqueness guarantee: the combination of index and columns must uniquely identify each row. If the source data contains duplicate pairs, the function raises a ValueError immediately rather than producing ambiguous results.

MultiIndex Construction and Unstacking

After validation, pivot creates a temporary hierarchical MultiIndex by combining the selected index and columns. The method then calls DataFrame.unstack on the columns level to perform the actual reshape operation. This unstacking process transforms the long-format rows into a wide-format matrix where each unique value from your columns parameter becomes a new column header.

Basic Syntax and Parameters

The pivot method signature is straightforward:

DataFrame.pivot(index=None, columns=None, values=None)
  • index: String or list specifying column(s) to use as row labels (optional).
  • columns: String or list specifying column(s) whose unique values become new column headers.
  • values: String or list specifying column(s) containing data to populate cells (optional; defaults to all remaining columns).

Practical Examples of pandas pivot

Simple Numeric Pivot

Convert daily sales data from long to wide format using unique date-city combinations:

import pandas as pd

df = pd.DataFrame({
    "date":   ["2023-01-01", "2023-01-01", "2023-01-02", "2023-01-02"],
    "city":   ["NY", "SF", "NY", "SF"],
    "sales":  [100, 200, 150, 250]
})

pivoted = df.pivot(index="date", columns="city", values="sales")
print(pivoted)

Output:


city        NY   SF
date                 
2023-01-01 100  200
2023-01-02 150  250

Pivoting Non-Numeric Data

The method handles categorical values equally well, such as mapping managers to regions:

df2 = pd.DataFrame({
    "product": ["A", "A", "B", "B"],
    "region":  ["East", "West", "East", "West"],
    "manager": ["John", "Jane", "Mike", "Anna"]
})

pivoted2 = df2.pivot(index="product", columns="region", values="manager")
print(pivoted2)

Output:


region       East  West
product                 
A            John  Jane
B            Mike  Anna

Handling Duplicate Keys with pivot_table

When your data contains duplicate index-column pairs, pivot fails intentionally. Use pivot_table with an aggregation function instead:

df3 = pd.DataFrame({
    "date":   ["2023-01-01", "2023-01-01"],
    "city":   ["NY", "NY"],
    "sales":  [100, 120]
})

# This raises ValueError due to duplicate (date, city) pairs

# pivoted = df3.pivot(index="date", columns="city", values="sales")

# Correct approach for duplicates:

pivoted3 = df3.pivot_table(index="date", columns="city", values="sales", aggfunc="sum")
print(pivoted3)

Output:


city        NY
date             
2023-01-01 220

When to Use pivot vs pivot_table

pivot is optimized for one-to-one mappings between index and column values. It skips aggregation logic entirely, making it faster and more memory-efficient for clean datasets.

pivot_table is required when duplicate entries exist. It aggregates multiple values into single cells using functions like mean, sum, or count, providing flexibility at the cost of computational overhead.

Summary

  • pandas.DataFrame.pivot is implemented in pandas/core/reshape/pivot.py and delegates to DataFrame.unstack for the final reshape operation.
  • The method enforces uniqueness: duplicate index-column combinations trigger a ValueError to prevent data ambiguity.
  • For datasets with duplicate keys, pivot_table provides essential aggregation capabilities that pivot intentionally lacks.
  • The function creates a temporary MultiIndex internally before unstacking to generate the wide-format output matrix.

Frequently Asked Questions

What is the difference between pandas pivot and pivot_table?

pivot performs a strict reshape that fails if index-column combinations are not unique, while pivot_table aggregates duplicate entries using functions like mean or sum. Use pivot for one-to-one mappings and pivot_table when you need to summarize multiple values into single cells.

Where is the pandas pivot implementation located?

The core algorithm lives in pandas/core/reshape/pivot.py, where the function validates inputs, constructs a hierarchical MultiIndex, and calls unstack to produce the wide-format result. The method is exposed to users through pandas/core/frame.py as part of the DataFrame class interface.

Why does pandas pivot raise a ValueError?

The method raises ValueError: Index contains duplicate entries, cannot reshape when your specified index and columns parameters map to multiple rows in the source data. This protects data integrity by preventing ambiguous reshaping operations where one cell would need to contain multiple values.

Can pandas pivot handle multiple value columns simultaneously?

Yes. If you omit the values parameter or pass a list of column names, pivot reshapes all specified value columns at once. The resulting DataFrame features a MultiIndex column structure with the original value column names at the top level and the pivoted categories from your columns parameter as the second level.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s https://instagit.com/install.md

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client