How to Use Pandas Explode to Unnest Columns into Multiple Rows
Use DataFrame.explode() to transform each element of a list-like column into its own row while preserving other column values, or use Series.explode() for individual Series objects.
The pandas explode functionality provides a straightforward way to unnest hierarchical data structures within the pandas-dev/pandas repository. Whether you are working with JSON-like nested lists, arrays of tags, or complex data structures stored in DataFrame cells, this method efficiently flattens list-like elements into separate rows while maintaining data alignment across all other columns.
What Is Pandas Explode?
pandas explode is a data transformation method that expands list-like entries—such as lists, tuples, Series, or other iterable containers—into individual rows. The operation creates a new DataFrame or Series where each element from the original list occupies its own row, with non-list values duplicated across the expanded rows.
According to the pandas source code, the implementation resides in two primary locations:
Series.explode– Defined inpandas/core/series.py(around line 4520), this contains the core expansion logic for Series objects.DataFrame.explode– Defined inpandas/core/frame.py(around line 13164), this serves as a thin wrapper that coordinates column-wise exploding while preserving index alignment.
Pandas Explode Syntax and Parameters
Understanding the method signatures helps you apply pandas explode correctly to your specific data structures.
Series.explode Signature
Series.explode(ignore_index: bool = False) -> Series
DataFrame.explode Signature
DataFrame.explode(column, ignore_index: bool = False) -> DataFrame
The parameters control the following behaviors:
column(DataFrame only) – Specifies the column or columns to explode. Accepts a single label or, since pandas 1.3.0, a list of column labels for simultaneous multi-column exploding.ignore_index– Controls index behavior. WhenFalse(default), the original index values repeat for each generated row. WhenTrue, the result receives a fresh integer index ranging from 0 to N-1.
How Pandas Explode Works Internally
The pandas explode implementation follows a sophisticated path through the library's internals to ensure efficient memory usage and correct type handling.
When you call Series.explode, the method first validates that the Series contains object dtype or a pandas extension type capable of holding list-like containers. It then delegates the heavy lifting to a low-level Cython routine named _explode, which iterates over the underlying ndarray. This Cython implementation expands each list-like entry into a sequence of values while efficiently managing memory allocation for the new array.
For index handling, the implementation creates a new index that repeats the original labels for each element generated from the corresponding list, unless ignore_index=True triggers a reset to a monotonic integer range.
DataFrame.explode extends this logic by first validating the column argument, then applying Series.explode to each specified column. When exploding multiple columns simultaneously (pandas 1.3.0+), the method aligns the newly created rows across all exploded columns, ensuring that the nth element of each list appears in the same resultant row.
Practical Examples of Using Pandas Explode
These runnable examples demonstrate common pandas explode patterns using the actual implementation from the pandas-dev/pandas repository.
Exploding a Single List-Like Column
The most common use case expands a column containing lists into separate rows:
import pandas as pd
df = pd.DataFrame({
"id": [1, 2, 3],
"tags": [["red", "blue"], ["green"], []]
})
# Default behavior keeps original index
exploded = df.explode("tags")
print(exploded)
Output:
id tags
0 1 red
0 1 blue
1 2 green
2 3 NaN
As implemented in pandas/core/frame.py, the explode method calls Series.explode from pandas/core/series.py for the specified column, repeating index 0 for both "red" and "blue" while preserving the association with id 1.
Resetting the Index with ignore_index
When you need a clean integer index rather than repeated labels:
exploded = df.explode("tags", ignore_index=True)
print(exploded)
Output:
id tags
0 1 red
1 1 blue
2 2 green
3 3 NaN
The ignore_index=True parameter triggers the index reset logic in Series.explode, generating a fresh RangeIndex from 0 to N-1.
Exploding Multiple Columns (Pandas 1.3+)
Since pandas 1.3.0, you can explode multiple columns simultaneously, aligning elements by position:
df2 = pd.DataFrame({
"id": [1, 2],
"colors": [["red", "blue"], ["green"]],
"shapes": [["circle"], ["square", "triangle"]]
})
# Explode both columns together
exploded_multi = df2.explode(["colors", "shapes"])
print(exploded_multi)
Output:
id colors shapes
0 1 red circle
0 1 blue circle
1 2 green square
1 2 green triangle
The DataFrame.explode implementation in pandas/core/frame.py handles the list of columns by iterating through each specified column and ensuring that the nth element of each list appears in the same row, repeating values where lists have unequal lengths.
Handling NaN and Empty Containers
The method gracefully handles missing values and empty lists:
df3 = pd.DataFrame({
"id": [1, 2, 3],
"items": [["a", "b"], None, []]
})
print(df3.explode("items"))
Output:
id items
0 1 a
0 1 b
1 2 None
2 3 NaN
As noted in the source code analysis, Series.explode treats None and empty lists as single-row entries, preserving them in the output rather than dropping them.
Key Source Files for Pandas Explode
Understanding the implementation location helps when debugging or contributing to the pandas-dev/pandas repository:
pandas/core/series.py– ContainsSeries.explodeimplementation around line 4520, including the core logic for expanding list-like elements and the Cython_explodecall.pandas/core/frame.py– HousesDataFrame.explodearound line 13164, which validates column arguments and coordinates multi-column exploding.pandas/core/arrays/arrow/accessors.py– Provides anexplodeaccessor for Arrow-backed DataFrames, offering optimized performance for PyArrow extension types.
Summary
pandas explodetransforms list-like column entries into separate rows, preserving other column values through index repetition.- The implementation resides in
pandas/core/series.pyfor Series operations andpandas/core/frame.pyfor DataFrame coordination. - Use
ignore_index=Trueto generate a fresh integer index rather than repeating original labels. - Since pandas 1.3.0, pass a list of column names to explode multiple columns simultaneously while maintaining positional alignment.
- The method handles edge cases gracefully:
Nonevalues and empty lists become single rows, and scalar values remain unchanged.
Frequently Asked Questions
What data types work with pandas explode?
pandas explode works with any list-like container stored in an object-dtype column or pandas extension array, including Python lists, tuples, NumPy arrays, and pandas Series. The underlying Cython routine _explode iterates through these containers to expand them into individual rows. Scalar values, NaN, and None are treated as single-element entries and remain intact.
Can I explode multiple columns at once in pandas?
Yes, starting with pandas 1.3.0, you can pass a list of column names to DataFrame.explode. The implementation in pandas/core/frame.py validates the column list and applies Series.explode to each column while aligning the results by position. This ensures that the first element of each exploded column appears in the same row, the second elements align, and so on, repeating values where lists have unequal lengths.
Does pandas explode modify the original DataFrame?
No, pandas explode always returns a new DataFrame or Series and leaves the original data unchanged. The operation creates a new object array through the Cython _explode routine and constructs a new index (either repeated or reset depending on ignore_index). This immutable approach ensures data safety and aligns with pandas' general copy-on-write semantics for transformation operations.
How do I handle empty lists or missing values when using pandas explode?
pandas explode treats empty lists and None values as single-row entries rather than removing them. When exploding a column containing None, NaN, or empty containers like [], each becomes its own row in the output. This behavior is handled in the Series.explode implementation within pandas/core/series.py, where the Cython expansion logic checks for list-like structures and falls back to scalar treatment for non-iterable or empty values.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s https://instagit.com/install.md