how-to-guide

How to Use Pandas Groupby Multiple Columns to Count Items in Each Group

February 13, 2026 pandas-dev/pandas ↗

Use df.groupby(['col1', 'col2']).size() to count rows in each group, or df.groupby(['col1', 'col2'])['col3'].count() to count non-null values in a specific column.

The pandas groupby multiple columns functionality is a core feature of the pandas-dev/pandas repository, enabling complex data aggregation across categorical combinations. When you pass a list of column names to the groupby method, pandas constructs a MultiIndex and leverages optimized Cython routines to compute counts efficiently, even on large datasets.

How Pandas Groupby Multiple Columns Works Internally

The DataFrame.groupby Entry Point

The public API for grouping operations begins in pandas/core/frame.py, where the DataFrame.groupby method is defined. When you call df.groupby(['city', 'year']), this method validates the column list and delegates to the internal grouping machinery.

GroupBy Object Construction

The actual grouping logic resides in pandas/core/groupby/generic.py, which implements the DataFrameGroupBy class. This class inherits from the base GroupBy class and provides methods like .size() and .count() that you call after grouping.

MultiIndex Key Generation

In pandas/core/groupby/groupby.py, the base GroupBy class handles the mechanics of grouping, including MultiIndex construction for multiple columns. When you pass a list of column names, pandas creates a hierarchical index where each level represents one grouping column, enabling efficient lookup and aggregation.

Optimized Counting Operations

The actual counting performance comes from pandas/_libs/hashtable.pyx, a Cython-optimized module that implements hash-based grouping routines. When you call .size() or .count(), these low-level routines walk the grouped blocks without Python overhead, ensuring high performance even on millions of rows.

Practical Examples: Counting with Pandas Groupby Multiple Columns

Count Rows in Each Group with .size()

The .size() method returns the number of rows for each combination of grouping columns, including groups with NaN values.

import pandas as pd

df = pd.DataFrame({
    "city":    ["NY", "NY", "LA", "LA", "NY", "LA"],
    "year":    [2020, 2020, 2021, 2021, 2021, 2020],
    "sales":   [100, 150, 200, 250, 300, 400],
    "product": ["A", "B", "A", "B", "A", "B"]
})

# Group by two columns and count rows per group

counts = df.groupby(["city", "year"]).size()
print(counts)

Output:


city  year
LA    2020    1
      2021    2
NY    2020    2
      2021    1
dtype: int64

Count Non-Null Values in a Specific Column

Use .count() on a specific column to exclude NaN values from the count.


# Count non-null sales entries per city-year group

sales_counts = df.groupby(["city", "year"])["sales"].count()
print(sales_counts)

Convert to Flat DataFrame with reset_index()

The .reset_index() method converts the MultiIndex result into a regular DataFrame with named columns.


# Convert Series to DataFrame with custom column name

result = counts.reset_index(name="group_count")
print(result)

Output:


   city  year  group_count
0    LA  2020            1
1    LA  2021            2
2    NY  2020            2
3    NY  2021            1

Group by Three Columns and Aggregate

Extend the pattern to three or more columns by adding elements to the list.


# Group by three columns and sum sales

sum_sales = df.groupby(["city", "year", "product"])["sales"].sum()
print(sum_sales)

Summary

Pandas groupby multiple columns accepts a list of column names: df.groupby(['col1', 'col2']).
The method constructs a MultiIndex internally for efficient hierarchical grouping.
Use .size() to count all rows per group, including those with NaN values.
Use .count() on a specific column to count only non-null values.
Core implementation files include pandas/core/frame.py, pandas/core/groupby/groupby.py, and the Cython-optimized pandas/_libs/hashtable.pyx.

Frequently Asked Questions

What is the difference between .size() and .count() in pandas groupby?

.size() returns the total number of rows in each group, including NaN values, and returns a Series with the group labels as the index. .count() returns the number of non-null values for each column (or a specific column if selected), automatically excluding NaN values from the tally.

How do I convert the MultiIndex result from groupby back to a regular DataFrame?

Call .reset_index() on the resulting Series or DataFrame. This converts the MultiIndex levels into regular columns. You can also pass name='count' to reset_index when working with a Series to name the values column appropriately.

Can I group by more than two columns using the same method?

Yes, you can group by any number of columns by extending the list passed to groupby. For example, df.groupby(['col1', 'col2', 'col3', 'col4']).size() works identically to the two-column case, creating a MultiIndex with four levels.

Why is pandas groupby with multiple columns fast even on large datasets?

Pandas optimizes grouping operations through Cython implementations in pandas/_libs/hashtable.pyx, which use hash-based algorithms to group data without Python loop overhead. The MultiIndex construction in pandas/core/groupby/groupby.py also ensures that key lookups remain efficient regardless of dataset size.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s https://instagit.com/install.md

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client