How to Concatenate a List of pandas DataFrames Using pandas.concat

Use pandas.concat() to efficiently merge multiple DataFrames vertically or horizontally while controlling index alignment, column matching, and memory usage through parameters like axis, join, and ignore_index.

The pandas.concat() function serves as the primary method for combining collections of DataFrame objects in the pandas-dev/pandas library. Located in pandas/core/reshape/concat.py, this implementation handles complex edge cases such as mismatched columns, hierarchical indexing, and duplicate labels while leveraging C-extensions for high-performance data buffer operations.

Core Architecture and Source Implementation

The concatenation logic resides in pandas/core/reshape/concat.py, where the public concat() function validates arguments and delegates to an internal _concat helper. This helper orchestrates a ConcatPlanner object that normalizes inputs, determines resulting axis dimensions, and constructs the final DataFrame by referencing underlying blocks. The actual data buffer manipulation occurs in optimized C-extensions, making this approach significantly faster than iterative methods like DataFrame.append().

Two additional internal modules support this process:

Essential Parameters for Concatenating DataFrames

Understanding the key parameters allows precise control over concatenation behavior:

  • objs (required): An iterable of DataFrame or Series objects. Typically passed as a list: [df1, df2, df3].

  • axis: Integer specifying the concatenation axis. Use 0 (default) for vertical stacking along rows, or 1 for horizontal concatenation along columns.

  • join: String specifying how to handle non-matching labels. 'outer' (default) performs a union of columns/indexes, while 'inner' keeps only the intersection.

  • ignore_index: Boolean; when True, creates a new sequential integer index (0, 1, 2...) for the result, discarding original index values. Useful when original indices are meaningless after concatenation.

  • keys: Sequence of labels to prefix each input block, creating a hierarchical MultiIndex that tracks the source of each row.

  • sort: Boolean; whether to sort the non-concatenated axis when the other axis is not aligned. Defaults to False to avoid unnecessary overhead.

  • verify_integrity: Boolean; when True, checks for duplicate labels in the new axis and raises ValueError if duplicates exist.

  • copy: Boolean; when False, allows underlying data sharing between input and output for better performance, though this requires caution with mutable views.

Practical Examples

Vertical Concatenation Along Axis 0

By default, pandas.concat() stacks DataFrames vertically along axis 0, preserving all columns from the union of inputs:

import pandas as pd

df1 = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
df2 = pd.DataFrame({"A": [5, 6], "B": [7, 8]})
result = pd.concat([df1, df2])  # axis=0 is default

print(result)

   A  B
0  1  3
1  2  4
0  5  7
1  6  8

Horizontal Concatenation with Different Columns

Specify axis=1 to concatenate DataFrames side-by-side. Use the join parameter to control whether to perform an outer or inner join on the indexes:

df3 = pd.DataFrame({"C": [9, 10]})
horiz = pd.concat([df1, df3], axis=1, join="outer")

print(horiz)

   A  B   C
0  1  3   9
1  2  4  10

Resetting Indexes with ignore_index

When original index values are meaningless after combination, set ignore_index=True to create a clean, continuous integer index:

ignore = pd.concat([df1, df2], ignore_index=True)

print(ignore)

   A  B
0  1  3
1  2  4
2  5  7
3  6  8

Creating Hierarchical Indexes with keys

Use the keys parameter to add a hierarchical level that identifies the source of each row:

keyed = pd.concat([df1, df2], keys=["first", "second"])

print(keyed)

            A  B
first  0    1  3
       1    2  4
second 0    5  7
       1    6  8

Efficiently Concatenating Large Lists

For long lists of DataFrames, pandas.concat() processes them in a single operation rather than iteratively copying data:

frames = [pd.DataFrame({"val": range(i, i+3)}) for i in range(0, 30, 3)]
big = pd.concat(frames, ignore_index=True)

print(big.head())

   val
0    0
1    1
2    2
3    3
4    4

Summary

  • pandas.concat() in pandas/core/reshape/concat.py is the optimal method for combining lists of DataFrames, outperforming iterative append() operations through its ConcatPlanner architecture and C-extension optimizations.
  • The axis parameter controls vertical (0) versus horizontal (1) concatenation.
  • Use ignore_index=True to generate a fresh sequential index when source indexes are irrelevant.
  • The join parameter manages column alignment, with 'outer' preserving all columns and 'inner' keeping only shared columns.
  • Setting copy=False can improve memory efficiency by sharing underlying data buffers, though this requires careful handling of mutable views.

Frequently Asked Questions

What is the difference between pandas.concat and DataFrame.append?

DataFrame.append() is deprecated and essentially wraps pandas.concat() with limited functionality. According to the pandas source code, concat() handles multiple DataFrames efficiently through its ConcatPlanner architecture and C-extension optimizations, while append() processes items iteratively and creates unnecessary copies.

How do I concatenate DataFrames with different columns?

Use join='outer' (default) to include all columns from all DataFrames, filling missing values with NaN. Alternatively, use join='inner' to keep only columns present in every DataFrame. Control the axis by setting axis=1 for side-by-side concatenation.

Why should I use ignore_index when concatenating?

Set ignore_index=True when the original row identifiers from individual DataFrames become meaningless in the combined dataset. This creates a new integer index (0 to n-1) for the result, eliminating duplicate index values that might otherwise cause confusion or errors in downstream operations.

Is pandas.concat memory efficient?

Yes, particularly when setting copy=False. The implementation in pandas/core/reshape/concat.py minimizes data copying by referencing underlying blocks and utilizing C-extensions for buffer handling. However, be cautious with copy=False if you plan to modify the resulting DataFrame, as changes may propagate to the original inputs.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s https://instagit.com/install.md

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client