how-to-guide

How to Use pandas groupby count and mean to Compute Group Statistics

February 12, 2026 pandas-dev/pandas ↗

Use the agg() method with ['count', 'mean'] to calculate both non-null counts and arithmetic averages for each group in a single vectorized operation.

The pandas-dev/pandas repository implements grouping logic through the GroupBy family of classes located in pandas/core/groupby/. When you call DataFrame.groupby(), you receive a DataFrameGroupBy object that provides efficient, Cython-backed aggregation methods. Understanding how to combine pandas groupby count operations with statistical reductions like mean allows you to generate comprehensive group summaries without iterative Python loops.

Computing Count and Mean with agg()

The most efficient way to obtain both statistics simultaneously is passing a list of function names to the agg() method. This approach reuses the underlying aggregation kernel and aligns results automatically.

import pandas as pd

df = pd.DataFrame({
    'category': ['A', 'A', 'B', 'B', 'B', 'C'],
    'value1': [10, 15, 10, None, 30, 25],
    'value2': [1, 2, 3, 4, 5, 6]
})

# Calculate count of non-null values and mean for each column per group

result = df.groupby('category').agg(['count', 'mean'])
print(result)

          value1               value2          
           count  mean        count  mean
category                                  
A              2  12.5            2   1.5
B              2  20.0            3   4.0
C              1  25.0            1   6.0

As shown above, count excludes NaN values (category B shows 2 for value1 despite having 3 rows), while mean computes only on non-missing entries.

count() vs size(): Choosing the Right Row Counter

The GroupBy class in pandas/core/groupby/generic.py implements two distinct methods for counting rows:

count() (line ~1315): Returns the number of non-null values per column for each group, implemented via _groupby_agg with the "count" operation.
size() (line ~1365): Returns the total number of rows per group regardless of missing values, including NaN entries.

When you need the raw group size rather than valid observations, use size():


# Total rows per group (includes NaN rows)

group_sizes = df.groupby('category').size()
print(group_sizes)

category
A    2
B    3
C    1
dtype: int64

Named Aggregations for Custom Output Columns

For cleaner column names, use named aggregations (Python 3.6+) within agg(). This syntax lets you specify custom result column names while selecting specific source columns and functions.

custom_stats = df.groupby('category').agg(
    row_count=('value1', 'size'),   # Equivalent to size()

    valid_count=('value1', 'count'), # Non-null count

    avg_value=('value1', 'mean')     # Arithmetic mean

)
print(custom_stats)

          row_count  valid_count  avg_value
category                                   
A                2            2       12.5
B                3            2       20.0
C                1            1       25.0

Implementation Details in pandas Source Code

According to the pandas source code, both count() and mean() leverage the same private method _groupby_agg defined around line 1191 in pandas/core/groupby/generic.py. This method builds a Cython-backed aggregation plan located in pandas/core/groupby/ops.py, which iterates over group indices and applies the requested reduction.

The agg() method loops over your supplied function list, reuses this aggregation engine for each statistic, and concatenates results along a hierarchical column index. This architecture ensures that separate calls to count() and mean() share the same group boundary calculations, minimizing overhead when computing multiple statistics.

Summary

Use .agg(['count', 'mean']) to compute multiple statistics in one call, returning a DataFrame with hierarchical columns.
count() tallies only non-null values per column, while size() counts all rows per group including those with missing data.
Named aggregations (agg(name=('column', 'func'))) produce clean, flat column headers instead of MultiIndex columns.
The underlying implementation in pandas/core/groupby/generic.py uses Cython-optimized kernels in ops.py for efficient group-wise calculations.

Frequently Asked Questions

What is the difference between count and size in pandas groupby?

count() returns the number of non-null values for each column within the group, excluding NaN entries. size() returns the total number of rows in each group regardless of missing values. Use count() when analyzing data completeness and size() when you need the raw group cardinality.

How do I calculate different statistics for different columns?

Pass a dictionary to agg() where keys are column names and values are functions or lists of functions. For example: df.groupby('key').agg({'col1': 'mean', 'col2': ['count', 'sum']}). This computes the mean for col1 and both count and sum for col2.

Why does my groupby count return fewer rows than expected?

The count() method ignores NaN values by design. If your data contains missing values, the count will reflect only valid observations. Switch to size() if you need to count all rows including those with null data.

How can I flatten the MultiIndex columns after using agg?

After calling .agg(['count', 'mean']), the resulting columns are a MultiIndex. Flatten them by joining the levels: result.columns = ['_'.join(col).strip() for col in result.columns.values] or by using named aggregations to avoid creating a MultiIndex initially.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s https://instagit.com/install.md

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client