How to Use Pandas Count to Get the Number of Records in a DataFrame
Use len(df) to return the total number of rows in a DataFrame, while df.count() returns the count of non-null values for each column according to the pandas-dev/pandas source code.
The pandas count method is a fundamental tool for data validation, but it behaves differently than many expect when you need to determine the number of records or rows. In the pandas-dev/pandas repository, this method is implemented as part of the NDFrame hierarchy and optimized through Cython routines for performance across large datasets.
Understanding Pandas Count vs. DataFrame Length
DataFrame.count is inherited from the NDFrame base class and returns the count of non-null values for each column by default. When you need the total number of rows—including those that may contain entirely null values—the idiomatic approach is Python's built-in len() function.
According to the implementation in pandas/core/frame.py, the count() method validates arguments (axis, numeric_only, skipna) before delegating to the _reduce helper. This architectural decision makes count() column-oriented by design, which explains why it returns a Series of counts rather than a single scalar representing row count.
How Pandas Count Works Under the Hood
The NDFrame Architecture
The counting logic begins in pandas/core/frame.py where DataFrame.count calls self._reduce with the reduction function Series.count. The _reduce method, also defined in frame.py, loops over each block of homogeneous dtype and applies the counting operation.
This delegation pattern leverages the implementation in pandas/core/series.py, where Series.count handles the actual counting logic for individual columns. The base infrastructure is defined in pandas/core/generic.py, which provides the shared reduction mechanisms used by both DataFrames and Series.
The Cython Optimization Layer
For performance, pandas moves the heavy computation to Cython. The _reduce method invokes the nancount routine (accessible through pandas/core/internals/managers.py) to count non-NA values efficiently across each block. This low-level optimization allows pandas to handle large datasets without the overhead of pure Python iteration.
Practical Methods to Count Rows in Pandas
Total Rows with len()
To get the total number of records regardless of null values, use Python's built-in len() function:
import pandas as pd
import numpy as np
df = pd.DataFrame({
"A": [1, 2, np.nan, 4],
"B": [np.nan, 5, 6, 7],
"C": [np.nan, np.nan, np.nan, np.nan],
})
total_rows = len(df)
print(total_rows) # Output: 4
This approach returns 4 because len() queries the DataFrame's index length directly, bypassing the null-checking logic entirely.
Non-Null Value Counts with count()
When you need to understand data completeness per column, use count():
col_counts = df.count()
print(col_counts)
# Output:
# A 3
# B 3
# C 0
# dtype: int64
As implemented in the source code, this method iterates through each column via the block manager and counts only non-null entries.
Rows with Any Data using notna()
To count rows that contain at least one non-null value—effectively excluding completely empty rows—combine notna() with any():
rows_with_data = df.notna().any(axis=1).sum()
print(rows_with_data) # Output: 3
This pattern leverages the same underlying block engine used by count(), generating fast boolean masks to identify rows with data.
Maximum Column Count Approach
If you know your DataFrame does not contain entirely null rows, you can derive the row count from count() results:
max_col_count = df.count().max()
print(max_col_count) # Output: 3
Note that this returns 3, not 4, because column C contains all null values, demonstrating why len(df) remains the reliable choice for total row counts.
Summary
len(df)returns the total number of rows in the DataFrame index, including rows with all null values.df.count()returns a Series of non-null value counts for each column, implemented inpandas/core/frame.pyvia the_reducehelper.df.notna().any(axis=1).sum()counts rows containing at least one non-null value by leveraging boolean masks.- The underlying implementation uses Cython's
nancountroutine inpandas/core/internals/managers.pyfor performance optimization.
Frequently Asked Questions
What's the difference between len(df) and df.count()?
len(df) returns the total number of rows based on the DataFrame's index length, while df.count() returns the number of non-null values for each column. According to the pandas source code in pandas/core/frame.py, count() is designed to ignore null values and operates column-wise, making len() the correct choice for simple row counting.
How do I count only rows with no missing values?
To count rows where all columns contain non-null values, use df.dropna().shape[0] or len(df.dropna()). This creates a filtered view excluding any row with at least one null value, then returns the length of the resulting DataFrame.
Why does df.count() return different values for each column?
df.count() returns different values because it counts only non-null entries per column independently. As implemented in pandas/core/series.py and called via DataFrame._reduce, each column is processed separately through the nancount Cython routine, resulting in varying counts based on the distribution of null values in each column.
Is df.shape[0] better than len(df) for counting rows?
Both df.shape[0] and len(df) return the same value for the number of rows, but len(df) is slightly more idiomatic and readable. The shape attribute queries the underlying array dimensions, while len() explicitly calls the DataFrame's __len__ method, though both operations have negligible performance differences in the pandas architecture.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →