How to Melt a Pandas DataFrame: A Complete Guide to Reshaping Data from Wide to Long Format
Use pd.melt() to transform wide-format data into long (tidy) format by specifying identifier columns with id_vars and measured variables with value_vars, creating a standardized structure for analysis.
The pandas.melt function converts DataFrames from wide format to long format, an essential transformation for data normalization and visualization preparation. According to the pandas-dev/pandas source code, the core implementation resides in pandas/core/reshape/melt.py and leverages helper utilities like _unpivot to handle the unpivoting logic efficiently. Whether you are restructuring survey data or normalizing time-series measurements, understanding how to melt a pandas dataframe enables you to create tidy datasets that work seamlessly with seaborn, ggplot, and other analytical tools.
How pandas.melt Works Internally
The melting process follows a structured four-step approach implemented in pandas/core/reshape/melt.py. First, the function identifies identifier columns (id_vars) that remain unchanged during the transformation. Second, it selects measured variables (value_vars) that will be unpivoted into rows. Third, it constructs a new DataFrame where each combination of identifier and measured variable becomes a separate row, storing variable names in a dedicated column and values in a corresponding value column. Finally, the implementation respects the DataFrame's dtype and copy semantics to prevent unintended side effects, utilizing the internal _unpivot helper to manage the actual data restructuring.
While you can access this functionality through the top-level pd.melt() function, the method is also exposed as DataFrame.melt() in pandas/core/frame.py, providing object-oriented convenience for chaining operations.
Essential Parameters for Melting DataFrames
Understanding the key parameters ensures you can handle complex reshaping scenarios effectively:
id_vars: Column(s) to use as identifier variables. These columns remain vertical and are repeated for each measured variable.value_vars: Column(s) to unpivot. If not specified, all columns not inid_varsare melted.var_name: The name for the new "variable" column that stores the former column headers (defaults to "variable").value_name: The name for the new "value" column that stores the melted values (defaults to "value").ignore_index: Boolean indicating whether to reset the index in the resulting DataFrame (defaultTrue).
Practical Examples of Melting DataFrames
Converting Wide Sales Data to Long Format
The most common use case involves unpivoting yearly data columns while preserving country identifiers. This example demonstrates the basic syntax using id_vars and custom naming:
import pandas as pd
# Sample wide-format DataFrame
df = pd.DataFrame({
"country": ["US", "CA", "MX"],
"2019_sales": [100, 150, 130],
"2020_sales": [110, 160, 140]
})
# Basic melt: keep 'country' as identifier, unpivot the sales columns
melted = pd.melt(df,
id_vars=["country"],
var_name="year",
value_name="sales")
print(melted)
Output:
country year sales
0 US 2019_sales 100
1 CA 2019_sales 150
2 MX 2019_sales 130
3 US 2020_sales 110
4 CA 2020_sales 160
5 MX 2020_sales 140
Selecting Specific Columns to Melt
When working with DataFrames containing extra metadata columns, explicitly define value_vars to melt only the target measurements:
# Melt with custom variable/value column names and selecting specific columns
melted_custom = pd.melt(df,
id_vars="country",
value_vars=["2019_sales", "2020_sales"],
var_name="year",
value_name="revenue")
print(melted_custom)
Output:
country year revenue
0 US 2019_sales 100
1 CA 2019_sales 150
2 MX 2019_sales 130
3 US 2020_sales 110
4 CA 2020_sales 160
5 MX 2020_sales 140
Handling DataFrame Indices During Melt
When your data uses meaningful row indices, control whether they persist in the result using ignore_index. Setting this to True creates a fresh RangeIndex, while False preserves the original index values repeated across melted rows:
# Melt with ignore_index=True to reset the index in the result
df.set_index("country", inplace=True)
melted_no_index = pd.melt(df,
ignore_index=True,
var_name="year",
value_name="sales")
print(melted_no_index)
Output:
year sales
0 2019_sales 100
1 2019_sales 150
2 2019_sales 130
3 2020_sales 110
4 2020_sales 160
5 2020_sales 140
Memory Management and Type Preservation
As implemented in pandas/core/reshape/melt.py, the melt function carefully manages memory through copy semantics. When possible, the function avoids unnecessary data duplication while ensuring that modifications to the melted result do not propagate back to the original DataFrame. The implementation preserves dtype consistency across the melted value column, maintaining numeric precision and categorical classifications from the source columns.
Summary
pandas.melttransforms wide-format DataFrames into long (tidy) format by unpivoting columns into rows, implemented inpandas/core/reshape/melt.py.- Use
id_varsto specify columns that should remain unchanged, andvalue_varsto select specific columns for melting. - Customize output column names using
var_nameandvalue_nameto create descriptive schemas for your dataset. - Control index behavior with
ignore_index, which defaults toTruefor creating clean sequential indices. - The underlying
_unpivothelper ensures efficient data restructuring while preserving dtype integrity and managing copy semantics.
Frequently Asked Questions
What happens if I don't specify id_vars when melting a DataFrame?
If you omit id_vars, pandas melt treats all columns as measured variables and unpivots the entire DataFrame into two columns: one for the former column names and one for the values. This results in a long-format table without identifier columns to group related measurements, which is rarely useful for analysis unless you are purely stacking data.
How do I rename the variable and value columns when using pandas melt?
Pass string values to the var_name and value_name parameters to customize the default column headers. For example, var_name="year" and value_name="revenue" transform the generic "variable" and "value" columns into domain-specific labels that improve code readability and visualization compatibility.
Does pandas melt preserve the original DataFrame's index?
By default, pandas.melt resets the index in the result due to ignore_index=True, creating a new RangeIndex from 0 to N-1. If you set ignore_index=False, the function preserves the original index values, repeating them for each row created during the unpivoting process, as managed by the _unpivot utility in the source code.
What is the difference between pd.melt() and DataFrame.melt()?
pd.melt() is the module-level function that accepts the DataFrame as its first argument, while DataFrame.melt() is an instance method available on DataFrame objects defined in pandas/core/frame.py. Both invoke the same underlying implementation in pandas/core/reshape/melt.py, but the method version enables fluent chaining with other DataFrame operations.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →