How to Find Percentile Stats of a Column Using pandas Percentile

Use the quantile() method on any pandas Series or DataFrame, passing the desired percentile as a decimal between 0 and 1 (e.g., 0.75 for the 75th percentile) to compute percentile statistics while automatically excluding missing values.

In the pandas-dev/pandas repository, percentile calculations are performed through the quantile API rather than a dedicated percentile method. This interface is implemented on the NDFrame base class in pandas/core/generic.py and exposed by both Series and DataFrame objects, providing a vectorized pathway to descriptive statistics. The implementation delegates to high-performance Cython routines in pandas/_libs/algos.pyx before ultimately calling NumPy's percentile function for the final numeric computation.

How the pandas Percentile Architecture Works

The core percentile functionality resides in pandas/core/generic.py, where the NDFrame.quantile method defines the generic entry point for both Series and DataFrames. When you invoke DataFrame.quantile() (defined in pandas/core/frame.py) or Series.quantile() (inherited from NDFrame), pandas handles axis selection, data type conversion, and missing value exclusion before routing the calculation to low-level functions.

The actual computation occurs in pandas/_libs/algos.pyx, where Cython-accelerated quantile_* functions process the array data and call numpy.percentile for the final result. This architecture ensures that pandas percentile operations maintain flexibility for handling mixed data types, interpolation methods, and missing values while delivering NumPy-level performance.

Computing Single Column Percentiles

To calculate the percentile for a specific column, access the column as a Series and call quantile() with the desired quantile as a float between 0 and 1.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    "age": [23, 45, 31, 35, 28, 40, 22],
    "salary": [50000, 80000, 62000, 72000, 54000, 77000, 48000],
    "dept": ["HR", "IT", "IT", "Finance", "HR", "Finance", "IT"]
})

# 75th percentile of a single column

p75_salary = df["salary"].quantile(0.75)
print(p75_salary)

# 77000.0

You can calculate multiple percentiles simultaneously by passing a list to the q parameter. This returns a Series mapping each quantile to its corresponding value:


# Calculate 25th, 50th (median), and 75th percentiles

percentiles = [0.25, 0.5, 0.75]
salary_quants = df["salary"].quantile(percentiles)
print(salary_quants)

# 0.25    54000.0

# 0.50    62000.0

# 0.75    77000.0

# Name: salary, dtype: float64

Calculating Percentiles Across DataFrame Columns

When called on a DataFrame, quantile() computes percentiles across all numeric columns by default, returning a DataFrame where rows represent the requested percentiles.


# Percentiles across all numeric columns

df_quants = df.quantile([0.25, 0.5, 0.75])
print(df_quants)

#       age   salary

# 0.25  27.5  54000.0

# 0.50  33.0  62000.0

# 0.75  40.0  77000.0

By default, axis=0 calculates percentiles column-wise. Set axis=1 to compute percentiles across rows instead. The numeric_only parameter controls whether non-numeric columns raise an error or are silently excluded (defaults to True for DataFrames).

Interpolation Methods and Missing Data Handling

pandas supports the same interpolation methods as NumPy for cases where the desired percentile falls between two data points. Available options include linear (default), lower, higher, midpoint, and nearest.


# Use 'nearest' interpolation instead of linear

p90_nearest = df["salary"].quantile(0.90, interpolation="nearest")
print(p90_nearest)

# 80000

Missing values are automatically excluded from percentile calculations. The skipna parameter defaults to True, ensuring that NaN values do not propagate into results:

df_with_nan = df.copy()
df_with_nan.loc[2, "salary"] = np.nan

# Automatically ignores NaN values

p50_salary = df_with_nan["salary"].quantile(0.5)
print(p50_salary)

# 62000.0

If you set skipna=False, any percentile calculation involving missing values returns NaN.

Summary

  • pandas uses quantile() not percentile: Pass values between 0 and 1 to the q parameter to specify percentiles (e.g., 0.95 for the 95th percentile).
  • Single column calculations: Use df["column"].quantile(q) to return scalar values or Series for multiple quantiles.
  • Architecture: The implementation spans pandas/core/generic.py (generic logic), pandas/core/frame.py (DataFrame specialization), and pandas/_libs/algos.pyx (performance-critical computation).
  • Missing data: NaN values are excluded by default via skipna=True.
  • Interpolation: Control how fractional percentiles are calculated using the interpolation parameter with options like linear, nearest, or midpoint.

Frequently Asked Questions

What is the difference between pandas percentile and quantile?

In pandas, the terms are functionally equivalent but use different scales. The quantile() method accepts values between 0 and 1 (where 0.5 represents the 50th percentile), whereas traditional percentile notation uses 0 to 100. There is no percentile() method in pandas; you simply multiply your desired percentile by 0.01 when calling quantile().

How do I calculate multiple percentiles at once?

Pass a list of floats to the q parameter, such as df["column"].quantile([0.1, 0.5, 0.9]). For a Series, this returns a Series indexed by the quantile values. For a DataFrame, this returns a DataFrame with quantiles as row labels and columns as the original numeric columns.

How does pandas handle missing values in percentile calculations?

By default, quantile() excludes NaN values automatically via the skipna=True parameter. If you set skipna=False, the presence of any missing value in the data will result in a NaN return value for that calculation.

What interpolation methods does pandas support for percentiles?

pandas supports five interpolation methods: linear (default), lower, higher, midpoint, and nearest. These match NumPy's percentile interpolation options and determine how to select or calculate the value when the desired quantile falls between two data points.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s https://instagit.com/install.md

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client