How to Rank Rows in Pandas Based on Multiple Columns: 3 Proven Methods

You can rank rows in a Pandas DataFrame based on multiple columns by either creating a composite score column and ranking that, averaging individual column ranks, or sorting by multiple columns and ranking the resulting order.

Ranking rows by multiple criteria is a common requirement in data analysis, and the pandas-dev/pandas repository provides robust infrastructure to compute ranks based on multiple columns efficiently. The implementation delegates from the high-level DataFrame.rank API in pandas/core/generic.py to optimized algorithms in pandas/core/algorithms.py, enabling flexible tie-breaking and NaN handling. This guide demonstrates three proven methods to pandas rank based on multiple columns using the actual source code architecture.

Understanding the Pandas Rank Architecture

Before applying multi-column ranking strategies, it helps to understand how pandas processes rank calculations under the hood. When you call df.rank(), the method signature in pandas/core/generic.py (lines 9443-9450) validates parameters and dispatches to the low-level algos.rank function exposed in pandas/core/algorithms.py (lines 40-96).

This architecture means that regardless of which multi-column approach you choose—composite scoring, averaging ranks, or positional ranking—you ultimately invoke the same optimized C-backed ranking engine. The algorithms.py implementation handles tie-breaking methods (average, min, max, first, dense), ascending/descending logic, and NaN handling (na_option) consistently across all use cases.

Method 1: Rank on a Composite Score Column

The most straightforward way to pandas rank based on multiple columns is to aggregate the columns into a single composite score, then rank that derived column. This approach respects business logic (e.g., weighting wins more heavily than points) while leveraging the standard Series.rank machinery.

import pandas as pd

# Sample DataFrame

df = pd.DataFrame({
    "team": ["A", "B", "C", "D", "E"],
    "wins": [10, 8, 12, 7, 9],
    "points": [30, 25, 35, 20, 28],
    "turnover": [5, 8, 3, 7, 6],
})

# Create a composite score (e.g., weighted sum)

df["score"] = df["wins"] * 2 + df["points"] - df["turnover"] * 1.5

# Rank by the composite score (higher score → higher rank)

df["rank"] = df["score"].rank(ascending=False, method="dense")
print(df[["team", "score", "rank"]])

Underlying execution: df["score"].rank(...)pandas/core/generic.pypandas/core/algorithms.rank.

Method 2: Average Individual Column Ranks

When no single formula adequately captures the relationship between columns, compute per-column ranks first, then aggregate those ranks. This method normalizes each column to the same scale before combining, preventing high-magnitude columns from dominating the composite.


# Rank each numeric column individually (ascending=False for higher-is-better)

col_ranks = df[["wins", "points", "turnover"]].rank(ascending=False, method="average")

# Compute the mean rank across the selected columns

df["mean_rank"] = col_ranks.mean(axis=1)

# Convert the mean rank to a dense integer ranking

df["final_rank"] = df["mean_rank"].rank(method="dense")
print(df[["team", "mean_rank", "final_rank"]])

This approach calls DataFrame.rank in pandas/core/generic.py, which iterates over columns and invokes the same algorithms.rank kernel for each series. The resulting ranks are then aggregated via standard pandas arithmetic.

Method 3: Rank by Multi-Column Sort Order

For deterministic tie-breaking that respects hierarchical column priority, sort by multiple columns and use the positional index as the rank. This method is equivalent to SQL's ROW_NUMBER() over an ordered partition and guarantees unique ranks even when composite scores tie.


# Sort by wins (desc), then points (desc), then turnover (asc)

df_sorted = df.sort_values(
    by=["wins", "points", "turnover"],
    ascending=[False, False, True],
).reset_index(drop=True)

# The index after sorting reflects the desired order; use as 1-based rank

df_sorted["order_rank"] = df_sorted.index + 1
print(df_sorted[["team", "wins", "points", "turnover", "order_rank"]])

While this bypasses the rank algorithm in pandas/core/algorithms.py, it leverages pandas' high-performance sorting (implemented in pandas/core/sorting.py) and produces equivalent ordinal rankings with explicit precedence rules.

Summary

  • Composite Score Ranking: Aggregate multiple columns into a single derived metric using business logic, then call rank() on that column. Best when column relationships are linear and weightable.
  • Averaged Ranks: Rank each column individually using DataFrame.rank(), then average the results. Ideal when columns have different scales and no clear formula relates them.
  • Sort-Based Ranking: Use sort_values() with multiple columns to establish deterministic order, then use the positional index as the rank. Perfect for hierarchical tie-breaking without mathematical aggregation.

All three approaches ultimately rely on the same high-performance ranking engine implemented in pandas/core/algorithms.py, ensuring consistent behavior for tie-breaking methods and NaN handling.

Frequently Asked Questions

How does pandas handle ties when ranking multiple columns?

Pandas delegates tie-breaking to the method parameter in DataFrame.rank(), which is processed in pandas/core/algorithms.py. Available methods include average (mean of tied ranks), min (lowest rank for ties), max (highest rank for ties), first (appearance order), and dense (like min but no gaps). When ranking composite scores or averaged ranks, these methods apply to the final aggregated values.

Can I rank within groups while using multiple columns?

Yes. Use DataFrameGroupBy.rank() implemented in pandas/core/groupby/groupby.py (lines 4881-4890). First create your composite score or apply ranking logic within the group context: df.groupby('category').apply(lambda x: x[['col1', 'col2']].rank().mean(axis=1)). This performs the multi-column ranking calculation independently within each group partition.

What is the performance difference between composite scores and averaging ranks?

Creating a composite score and ranking once requires a single pass through the algorithms.rank engine in pandas/core/algorithms.py, making it O(n) complexity. Averaging ranks requires calling rank on each column (k columns) then aggregating, resulting in O(k×n) complexity. For large DataFrames with many columns, the composite score approach is typically 2-3× faster and more memory-efficient.

How are NaN values handled in multi-column ranking?

NaN handling is controlled by the na_option parameter in DataFrame.rank(), processed in pandas/core/algorithms.py. Options include keep (assigns NaN rank), top (treats NaN as smallest/largest depending on ascending), and bottom. When creating composite scores from multiple columns, handle NaNs during the aggregation phase (e.g., using skipna in sums) before calling rank() to ensure consistent behavior.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →