How to Sort a pandas DataFrame by Two or More Columns Using sort_values()

To sort a pandas DataFrame by multiple columns, pass a list of column names to the by parameter of DataFrame.sort_values(), optionally specifying individual sort directions via the ascending parameter.

The sort_values() method in the pandas-dev/pandas repository provides a flexible interface for ordering DataFrame rows lexicographically by one or more columns. Located in pandas/core/frame.py, this implementation handles complex multi-column sorting scenarios including mixed ascending/descending orders, missing value placement, and custom sorting keys.

Understanding DataFrame.sort_values() Syntax and Parameters

The DataFrame.sort_values() method signature in pandas/core/frame.py accepts several parameters that control multi-column sorting behavior:

def sort_values(
    self,
    by: IndexLabel,
    *,
    axis: Axis = 0,
    ascending: bool | list[bool] | tuple[bool, ...] = True,
    inplace: bool = False,
    kind: SortKind = "quicksort",
    na_position: str = "last",
    ignore_index: bool = False,
    key: ValueKeyFunc | None = None,
) -> DataFrame | None:

The by Parameter and Lexicographic Sorting

The by parameter accepts either a single column label or a list-like of labels (IndexLabel). When you pass a list such as ["col1", "col2"], pandas performs a lexicographic sort: it first orders rows by the first column, then uses the second column to break ties, continuing through subsequent columns as needed.

This behavior is implemented in the sort_values method body within pandas/core/frame.py, which delegates to the sorting engine in pandas/core/sorting.py for the actual key-based reordering.

Controlling Sort Direction with ascending

The ascending parameter supports boolean values or sequences. When sorting by multiple columns, you can pass a list or tuple of booleans where each element corresponds to the respective column in by. For example, ascending=[True, False] sorts the first column in ascending order and the second in descending order.

The implementation validates that the length of ascending matches the length of by when a sequence is provided, raising a ValueError if the lengths differ.

Handling Missing Values and Stability

The na_position parameter controls where NaN or None values appear, accepting either "first" or "last" (default). When sorting multiple columns, missing values in any of the sort keys follow this positioning rule.

For algorithm stability, pandas automatically selects a stable sort algorithm (mergesort or stable) when sorting by multiple columns unless you explicitly specify kind="quicksort". This ensures that rows with identical sort keys maintain their original relative order.

Practical Examples for Multi-Column Sorting

The following examples demonstrate common multi-column sorting patterns using the sort_values() implementation in pandas/core/frame.py.

First, create a sample DataFrame with mixed data types and missing values:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    "col1": ["A", "A", "B", np.nan, "D", "C"],
    "col2": [2, 1, 9, 8, 7, 4],
    "col3": [0, 1, 9, 4, 2, 3],
    "col4": ["a", "B", "c", "D", "e", "F"],
})

Sort by a Single Column

To sort by one column in ascending order (default):

df.sort_values(by="col1")

Sort by Multiple Columns Lexicographically

To sort by col1 first, then use col2 to break ties:

df.sort_values(by=["col1", "col2"])

This executes a stable lexicographic sort as implemented in the pandas sorting engine.

Sort with Mixed Ascending and Descending Orders

To sort col1 in ascending order and col2 in descending order:

df.sort_values(by=["col1", "col2"], ascending=[True, False])

The implementation validates that the ascending list length matches the by list length.

Place Missing Values First

To display rows with NaN values at the beginning of the result:

df.sort_values(by="col1", na_position="first")

Advanced Sorting with Custom Keys

The key parameter in DataFrame.sort_values() accepts a vectorized callable applied independently to each column in by, enabling complex sorting logic without modifying the original data.

Case-Insensitive String Sorting

To sort strings ignoring case:

df.sort_values(by="col4", key=lambda s: s.str.lower())

This applies the lowercase transformation only for the sort comparison, leaving the original casing intact in the result.

Natural Sorting with External Libraries

For "human-friendly" sorting of alphanumeric strings (e.g., "item2" before "item10"), use the natsort package with the key parameter:


# pip install natsort

from natsort import natsort_keygen

df_nat = pd.DataFrame({
    "hours": ["0hr", "128hr", "0hr", "64hr", "64hr", "128hr"],
    "mins": [5, 10, 2, 15, 1, 20]
})

df_nat.sort_values(by="hours", key=natsort_keygen())

Summary

  • DataFrame.sort_values() in pandas/core/frame.py is the primary method for sorting by multiple columns.
  • Pass a list of column names to the by parameter to execute a lexicographic sort that breaks ties using subsequent columns.
  • Use a list of booleans for ascending to control sort direction individually per column.
  • The implementation automatically uses stable sorting algorithms for multi-column sorts to preserve the relative order of duplicate keys.
  • Control missing value placement with na_position ("first" or "last").
  • Apply custom transformations via the key parameter for case-insensitive or natural sorting without altering the underlying data.

Frequently Asked Questions

What is the difference between sort_values and sort_index?

sort_values orders rows based on the values within columns, specified by the by parameter, while sort_index orders rows based on the DataFrame's index labels. Use sort_values when you need to sort by data content, and sort_index when you need to reorder by row labels or index levels.

How do I sort by multiple columns with different ascending orders?

Pass a list of boolean values to the ascending parameter where each boolean corresponds to the respective column in the by list. For example, df.sort_values(by=["A", "B"], ascending=[True, False]) sorts column "A" in ascending order and column "B" in descending order. The lengths of both lists must match, or pandas raises a ValueError.

Does sort_values modify the original DataFrame?

By default, sort_values returns a new sorted DataFrame and leaves the original unchanged. Set inplace=True to modify the original DataFrame in-place, which returns None. The default behavior (inplace=False) is recommended for method chaining and functional programming patterns.

How are NaN values handled when sorting multiple columns?

Missing values (NaN, None, or NaT) are placed after all valid values by default (na_position="last"). You can change this to na_position="first" to place missing values at the beginning of the result. When sorting by multiple columns, the na_position applies to all sort keys consistently.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →