How to Sort a Pandas DataFrame by Column: The Complete Guide to sort_values()
Use DataFrame.sort_values(by='column_name') to execute a pandas sort by column operation in ascending order, or specify ascending=False for descending order.
Sorting tabular data by column values is a fundamental data manipulation task. In the pandas-dev/pandas repository, the sort_values() method—implemented in pandas/core/frame.py—provides the primary interface for reordering DataFrame rows based on column data. This guide breaks down the implementation details, performance characteristics, and practical syntax for single-column sorting operations.
The Primary Method: sort_values() in pandas/core/frame.py
The canonical approach for sorting by column values resides in the sort_values() method of the DataFrame class. According to the source code in pandas/core/frame.py, the method signature accepts multiple parameters that control sort behavior:
DataFrame.sort_values(by, axis=0, ascending=True, inplace=False,
kind='quicksort', na_position='last', ignore_index=False, key=None)
When you execute a pandas sort by column operation, the method delegates to internal sorting engines in pandas/core/sorting.py and pandas/core/algorithms.py. For single-column sorts, the implementation extracts the column's underlying array, computes the indexer that would sort that array, then applies the indexer to all columns via the _take_nd machinery.
Single Column Sorting Syntax
To sort by a single column, pass the column name as a string to the by parameter:
import pandas as pd
df = pd.DataFrame({
'department': ['Engineering', 'Sales', 'Engineering', 'HR'],
'salary': [95000, 72000, 105000, 68000]
})
# Ascending sort (default)
sorted_asc = df.sort_values(by='salary')
# Descending sort
sorted_desc = df.sort_values(by='salary', ascending=False)
The axis parameter defaults to 0, meaning the sort operates on rows. While axis=1 exists for sorting columns by row values, the single-column row sort remains the most common pandas sort by column pattern.
Handling Missing Values with na_position
The na_position parameter—defined in pandas/core/frame.py—controls how NaN or None values are handled during sorting. According to the source implementation, this parameter is passed through to the sorting utilities in pandas/core/sorting.py, where the lexsort_indexer or nargsort functions handle placement:
# Place NaN values at the beginning
df.sort_values(by='salary', na_position='first')
# Place NaN values at the end (default)
df.sort_values(by='salary', na_position='last')
Sorting Algorithms and Performance
The kind parameter determines which algorithm the underlying NumPy or pandas sorting engine uses:
- quicksort: Default algorithm. Fastest average case but unstable (may change order of equal elements).
- mergesort: Stable sort using mergesort. Preserves the original order of duplicate keys.
- heapsort: See Heapsort algorithm.
- stable: Forces a stable sort using the best available implementation.
When sorting by a single column containing extension arrays (Categorical, DatetimeTZDtype, etc.), pandas may bypass NumPy entirely and use specialized sorters in pandas/core/array_algos/sort.py to handle type-specific optimizations.
In-Place vs Copy Operations
By default, sort_values() returns a new DataFrame copy. To modify the original DataFrame without memory allocation, use inplace=True:
# Sort in-place without copying
df.sort_values(by='department', inplace=True)
This parameter is inherited from the NDFrame base class in pandas/core/generic.py, which handles the actual data manipulation through the _update_inplace method after computing the new ordering.
Summary
- Primary method:
sort_values()inpandas/core/frame.pyis the standard tool for pandas sort by column operations. - Basic syntax: Pass a single string to
byfor single-column sorting, or useascending=Falsefor reverse order. - Missing data: Control NaN placement using
na_position='first'or'last'. - Algorithm choice: Select
kind='stable'when preserving the relative order of duplicate keys is required. - Memory efficiency: Set
inplace=Trueto sort without duplicating the DataFrame.
Frequently Asked Questions
How do I sort a pandas DataFrame by column in descending order?
Pass ascending=False to the sort_values() method. For example: df.sort_values(by='column_name', ascending=False). This flips the sort direction while maintaining the same algorithmic performance characteristics.
What is the difference between sort_values() and sort_index()?
sort_values() sorts the DataFrame rows based on the values in one or more columns, while sort_index() sorts rows based on the DataFrame's index labels. The former operates on data values (implemented in pandas/core/frame.py), while the latter operates on the index object (implemented in pandas/core/generic.py).
Why does pandas sort_values() change the order of my duplicate values?
By default, sort_values() uses kind='quicksort', which is an unstable sorting algorithm. To preserve the original order of rows with equal sort keys, specify kind='mergesort' or kind='stable', both of which guarantee stable sorting as implemented in pandas/core/sorting.py.
Can I sort by multiple columns simultaneously?
Yes, pass a list of column names to the by parameter: df.sort_values(by=['col1', 'col2']). The sort proceeds hierarchically—sorting by the first column, then using the second column to break ties. You can also pass a list to ascending to control direction per column: [True, False].
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s https://instagit.com/install.md