How to Use pandas sort by column: A Complete Guide to DataFrame.sort_values

Use DataFrame.sort_values() to reorder rows by column values, specifying the by parameter for single or multiple columns, and control the sort order with ascending, kind, and key arguments.

When you need to organize tabular data in the pandas-dev/pandas repository, the most efficient approach for a pandas sort by column operation is the sort_values() method. This high-performance function leverages optimized NumPy routines and a lightweight sorting engine to rearrange DataFrame rows without unnecessary data copying. Whether you are ranking sales figures, ordering timestamps, or prioritizing categorical labels, understanding the underlying implementation helps you write faster, more memory-efficient code.

Understanding the pandas sort by column Implementation

The DataFrame.sort_values method is not merely a convenience wrapper; it is a sophisticated pipeline that delegates heavy computation to highly optimized low-level routines.

The Core Architecture: From sort_values to safe_sort

Internally, sort_values is implemented as a thin wrapper around the generic NDFrame base class logic found in pandas/core/generic.py (lines 4868-4990). This entry point validates arguments such as by, axis, ascending, and kind, then determines the target columns by extracting them from the DataFrame’s block manager.

The concrete DataFrame-specific type signatures and overloads reside in pandas/core/frame.py (lines 7923-7950). These ensure that when you pass a string or list of strings to the by parameter, pandas correctly resolves them to column positions before proceeding to value extraction.

How safe_sort Handles the Heavy Lifting

Once columns are identified, the actual sorting logic is delegated to safe_sort in pandas/core/algorithms.py (lines 1431-1500). This dependency-free helper performs the following critical steps:

  • Algorithm Selection: It uses NumPy’s argsort under the hood, defaulting to quicksort unless you specify kind='mergesort', heapsort, or stable.
  • Type Handling: For mixed-type arrays, it routes data through _sort_mixed or _sort_tuples to ensure consistent ordering.
  • NaN Management: The na_position argument is applied here, determining whether missing values float to the top or sink to the bottom.
  • Key Function Application: If you provide a vectorized key function (e.g., str.lower), it is applied to the column values before the sort permutation is calculated.

After safe_sort returns a permutation index, the DataFrame’s block manager applies this index via self._mgr.take, reordering rows efficiently without copying unnecessary data.

Practical Examples: pandas sort by column in Action

The following examples demonstrate how to leverage sort_values for common data organization tasks.

Sort by a Single Column

To perform a simple alphabetical pandas sort by column, pass the column name as a string to the by parameter:

import pandas as pd

df = pd.DataFrame(
    {
        "city": ["Paris", "Berlin", "London", "Tokyo", "New York"],
        "population": [2_200_000, 3_600_000, 8_900_000, 13_900_000, 8_300_000],
        "area_km2": [105, 891, 1572, 2194, 783],
    }
)

# Sort alphabetically by city name

sorted_by_city = df.sort_values(by="city")
print(sorted_by_city)
        city  population  area_km2
1     Berlin     3600000       891
4   New York     8300000       783
2     London     8900000      1572
0      Paris     2200000       105
3      Tokyo    13900000      2194

Sort by Multiple Columns with Different Orders

For complex ranking, supply a list to by and a matching list to ascending. This example sorts by descending population, then ascending area:


# Primary sort: population (high to low)

# Secondary sort: area (low to high) for ties

sorted_multi = df.sort_values(
    by=["population", "area_km2"],
    ascending=[False, True],
    kind="stable",          # Preserves original order when values are equal

)
print(sorted_multi)
        city  population  area_km2
3      Tokyo    13900000      2194
2     London     8900000      1572
4   New York     8300000       783
1     Berlin     3600000       891
0      Paris     2200000       105

Using a Custom Key Function

Apply vectorized transformations before sorting without modifying the original data. This example performs a case-insensitive sort:


# Sort ignoring case sensitivity

sorted_key = df.sort_values(
    by="city",
    key=lambda s: s.str.lower()
)
print(sorted_key)
        city  population  area_km2
1     Berlin     3600000       891
4   New York     8300000       783
2     London     8900000      1572
0      Paris     2200000       105
3      Tokyo    13900000      2194

In-Place Sorting for Memory Efficiency

When working with large datasets, avoid copying data by sorting in place:


# Modify the DataFrame directly, returns None

df.sort_values(by="population", inplace=True, ascending=False)
print(df)
        city  population  area_km2
3      Tokyo    13900000      2194
2     London     8900000      1572
4   New York     8300000       783
1     Berlin     3600000       891
0      Paris     2200000       105

Performance Considerations for Large DataFrames

The efficiency of pandas sort by column operations stems from the architecture described in the source code. Because safe_sort in pandas/core/algorithms.py is a lightweight, dependency-free routine, it minimizes overhead when processing millions of rows.

Key performance characteristics include:

  • Algorithm Selection: Choose kind='mergesort' or kind='stable' when you need to preserve the relative order of equal elements; use kind='quicksort' (default) for fastest average-case performance on numeric data.
  • Memory Management: The inplace=True parameter triggers self._mgr.take directly on the block manager, avoiding the memory overhead of creating a new DataFrame object.
  • Vectorized Keys: Applying a key function operates on the entire Series via vectorized string methods (e.g., .str.lower()), which is significantly faster than row-wise Python loops.

Summary

  • DataFrame.sort_values is the primary method for pandas sort by column operations, implemented in pandas/core/generic.py and pandas/core/frame.py.
  • The actual sorting logic delegates to safe_sort in pandas/core/algorithms.py, which uses NumPy's argsort and handles mixed types, NaN positioning, and stability.
  • You can sort by single or multiple columns using the by parameter, control direction with ascending, and apply transformations with the key argument.
  • For large datasets, use inplace=True to minimize memory usage and select appropriate algorithms (kind) based on stability requirements.

Frequently Asked Questions

How do I sort a pandas DataFrame by column values in descending order?

Pass ascending=False to the sort_values method. If sorting by multiple columns, provide a list of booleans matching the length of your by parameter, such as ascending=[False, True] to sort the first column descending and the second ascending.

What is the difference between sort_values and sort_index in pandas?

sort_values rearranges rows based on the data contained within one or more columns, while sort_index reorders rows or columns based on their index labels (row names) or column names. Use sort_values for value-based ranking and sort_index when you need to organize data by its positional or named indices.

How does pandas handle missing values when sorting by column?

By default, sort_values places NaN values at the end of the DataFrame regardless of the sort order. You can control this behavior using the na_position parameter, setting it to 'first' to float missing values to the top or 'last' to keep them at the bottom.

Is the pandas sort_values method stable?

Yes, when you specify kind='stable' or kind='mergesort', the sort preserves the relative order of rows that have equal values in the specified columns. This stability is implemented in the safe_sort function within pandas/core/algorithms.py, which uses NumPy's stable sorting algorithms when requested.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →