How to Combine Two DataFrames in pandas: concat, merge, and join Explained

Use pd.concat to stack DataFrames vertically or horizontally, DataFrame.merge for SQL-style joins on keys, or DataFrame.join for index-aligned operations.

The pandas library provides high-performance tools for combining datasets. Whether you need to stack tables, join on keys, or align by index, the pandas-dev/pandas repository implements these operations through optimized algorithms backed by the block manager. Understanding how to combine two DataFrames in pandas requires selecting the appropriate primitive based on your data alignment strategy.

Three Primary Methods to Combine DataFrames

Pandas offers three distinct approaches for combining DataFrames, each targeting different alignment scenarios according to the source code.

pd.concat for Vertical and Horizontal Stacking

The pd.concat function stacks DataFrames along a specified axis. According to the implementation in pandas/core/reshape/concat.py, this operation uses the ConcatOperation class to handle axis alignment, key management, and hierarchical indexing.

  • Axis 0 (vertical): Stacks rows when columns share compatible labels.
  • Axis 1 (horizontal): Stacks columns, aligning on index values and generating NaN for mismatched keys.

DataFrame.merge for SQL-Style Relational Joins

Implemented in pandas/core/reshape/merge.py, the merge function (accessible as pd.merge or the DataFrame.merge method) performs relational joins. The underlying _merge function parses join keys, handles suffixes for overlapping columns, and executes inner, left, right, outer, or cross joins.

DataFrame.join for Index-Based Convenience

The join method, defined in pandas/core/frame.py, provides a simplified interface for joining on index values. It is essentially a thin wrapper around merge that sets left_index=True and right_index=True, optimized for the common case where the index serves as the join key.

Practical Code Examples

Vertical Concatenation with pd.concat

Stack DataFrames vertically while resetting the index:

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': ['x', 'y']})
df2 = pd.DataFrame({'A': [3, 4], 'B': ['z', 'w']})

result = pd.concat([df1, df2], ignore_index=True)
print(result)

Output:

A B
0 1 x
1 2 y
2 3 z
3 4 w

Implementation path: pandas/core/reshape/concat.pyConcatOperation → block manager concat.

Horizontal Concatenation with Different Indexes

Combine columns while aligning on indexes, generating NaN for mismatched keys:

df1 = pd.DataFrame({'A': [1, 2]}, index=['a', 'b'])
df2 = pd.DataFrame({'B': [10, 20]}, index=['b', 'c'])

result = pd.concat([df1, df2], axis=1)
print(result)

Result:

A B
a 1 NaN
b 2 10
c NaN 20

Inner Join Using pd.merge

Perform a SQL-style inner join on a common key column:

left = pd.DataFrame({'key': ['K0', 'K1', 'K2'],
                     'A': [1, 2, 3]})
right = pd.DataFrame({'key': ['K0', 'K2', 'K3'],
                      'B': [4, 5, 6]})

merged = pd.merge(left, right, on='key', how='inner')
print(merged)

Output:

key A B
0 K0 1 4
1 K2 3 5

Implementation path: pandas/core/reshape/merge.py_merge function → block manager alignment.

Left Join with Suffixes for Overlapping Columns

Handle duplicate column names using the suffixes parameter:

left = pd.DataFrame({'key': ['K0', 'K1'],
                     'value': [1, 2]})
right = pd.DataFrame({'key': ['K0', 'K0'],
                      'value': [3, 4]})

joined = pd.merge(left, right, on='key', how='left', suffixes=('_L', '_R'))
print(joined)

Result:

key value_L value_R
0 K0 1 3
1 K0 1 4
2 K1 2 NaN

Index-Based Join Using DataFrame.join

Merge on index values using the convenient join method:

df_left = pd.DataFrame({'A': [1, 2]}, index=['a', 'b'])
df_right = pd.DataFrame({'B': [3, 4]}, index=['b', 'c'])

joined = df_left.join(df_right, how='outer')
print(joined)

Output:

A B
a 1 NaN
b 2 3
c NaN 4

join is implemented as a thin wrapper around merge with left_index=True/right_index=True in pandas/core/frame.py.

Key Implementation Files

Understanding the source architecture helps optimize performance and debug edge cases:

File Purpose
pandas/core/reshape/concat.py Core logic for pd.concat; handles axis alignment and hierarchical indexing via ConcatOperation.
pandas/core/reshape/merge.py Implements relational join algorithms, parsing how, on, and suffixes parameters.
pandas/core/frame.py Defines DataFrame.join and method dispatchers for combining operations.
pandas/core/internals/concat.py Low-level block manager utilities that enable copy-on-write memory efficiency.

All three combination methods share lazy alignment semantics, automatically aligning data on indexes or keys without unnecessary copies until required.

Summary

  • Use pd.concat when stacking DataFrames vertically (axis=0) or horizontally (axis=1) along compatible axes, implemented in pandas/core/reshape/concat.py.
  • Use DataFrame.merge (or pd.merge) for SQL-style joins on specific columns, supporting inner, left, right, and outer joins via pandas/core/reshape/merge.py.
  • Use DataFrame.join for fast index-to-index joins, which wraps merge logic inside pandas/core/frame.py.
  • All methods preserve memory through copy-on-write mechanisms and generate NaN for non-matching keys unless how='inner' is specified.

Frequently Asked Questions

What is the difference between merge and join in pandas?

merge is the general-purpose function for joining DataFrames on arbitrary columns, while join is a convenience method optimized for index-based alignment. According to the source in pandas/core/frame.py, join simply calls merge with left_index=True and right_index=True set by default, making it syntactic sugar for the common case of merging on indexes.

When should I use concat instead of merge to combine two DataFrames in pandas?

Use concat when you need to append DataFrames along an axis without matching keys—essentially stacking tables. Use merge when you need to align rows based on common key values. concat in pandas/core/reshape/concat.py handles vertical and horizontal stacking, while merge in pandas/core/reshape/merge.py performs relational algebra.

How do I handle overlapping column names when merging DataFrames?

Pass a tuple to the suffixes parameter in merge, such as suffixes=('_x', '_y'). This disambiguates columns that exist in both DataFrames but are not used as join keys. The _merge function in pandas/core/reshape/merge.py automatically applies these suffixes during the join operation.

Why does combining DataFrames produce NaN values?

Missing values appear when indexes or join keys exist in one DataFrame but not the other. This occurs during outer alignment in concat or outer joins in merge. Specify how='inner' to retain only matching keys, or use ignore_index=True in concat to reset the index and avoid alignment gaps.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s https://instagit.com/install.md

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client