How to Select Multiple Columns Using pandas loc with a List of Column Names
Pass a list of column labels as the second argument to DataFrame.loc (e.g., df.loc[:, ['col1', 'col2']]) to select multiple columns while preserving their order and handling duplicates correctly.
DataFrame.loc serves as the primary label-based indexer in the pandas-dev/pandas repository. Understanding how to pass a list of column names to this indexer allows you to extract non-contiguous columns, reorder your DataFrame, and dynamically filter data based on runtime conditions.
Understanding pandas loc Syntax for Multiple Columns
The loc indexer accepts two distinct arguments: a row selector and a column selector. When you provide a list as the column argument, pandas treats this as a request for multiple specific columns rather than a single label lookup.
In pandas/core/indexing.py, the _LocIndexer class implements this behavior. The code first validates whether your column selector is list-like using is_list_like(col). If true, it converts your list to an Index object before performing the underlying positional lookup via _get_slice_axis. This implementation detail ensures that:
- Order preservation: Columns appear in the result exactly as ordered in your input list
- Duplicate handling: If duplicate column names exist, all matching columns are returned
- Label validation: Pandas verifies all requested labels exist in the DataFrame's column index
Practical Examples of Selecting Multiple Columns with loc
Basic Column Selection
Select specific columns for all rows by using the slice : for the row position and a list for the columns:
import pandas as pd
df = pd.DataFrame({
"A": [1, 2, 3],
"B": [4, 5, 6],
"C": [7, 8, 9],
"D": [10, 11, 12]
})
# Select columns B and D for all rows
selected = df.loc[:, ["B", "D"]]
print(selected)
Output:
B D
0 4 10
1 5 11
2 6 12
Selecting Specific Rows and Columns
Combine row and column selection by providing lists for both indexers:
# Rows with index 0 and 2, columns A and C
subset = df.loc[[0, 2], ["A", "C"]]
print(subset)
Output:
A C
0 1 7
2 3 9
Dynamic Column Lists
You can construct column lists programmatically and pass them to loc:
cols_to_keep = ["A", "D"]
filtered = df.loc[:, cols_to_keep]
This pattern supports dynamic configurations, regex-based column matching, or user-defined inputs. The loc indexer in pandas/core/frame.py accepts any iterable that resolves to valid column labels.
Reordering Columns with loc
Because loc preserves the order of your input list, you can reorder columns without additional methods:
# Original order: A, B, C, D
new_order = ["D", "B", "A"]
reordered = df.loc[:, new_order]
print(reordered)
Output:
D B A
0 10 4 1
1 11 5 2
2 12 6 3
Handling Duplicate Column Names
When duplicate labels exist, loc returns all matching columns:
df_dup = pd.DataFrame([[1, 2, 3]], columns=["X", "Y", "X"])
# Selecting both "X" columns
selected_dup = df_dup.loc[:, ["X"]]
print(selected_dup)
Output:
X
0 1
The underlying _LocIndexer logic in pandas/core/indexing.py identifies all positions matching your label list and concatenates them into the result. To differentiate between duplicate columns, use df.columns.get_loc to obtain positional indices.
Summary
DataFrame.locaccepts a list of column names as its second argument to select multiple columns simultaneously- The implementation in
pandas/core/indexing.pypreserves the order of your input list and handles duplicates by returning all matching columns - Use
:as the row selector to select all rows while filtering columns - This approach supports dynamic column selection and allows you to reorder columns without additional methods
Frequently Asked Questions
Can I use loc to select columns by position instead of label?
No, loc is strictly label-based. If you need to select columns by integer position (0-indexed), use iloc instead. Mixing positions with loc will raise a KeyError if the integers don't match the DataFrame's index labels. According to the pandas source code in pandas/core/indexing.py, _LocIndexer explicitly validates against the axis labels, while _iLocIndexer handles integer positions.
What happens if a column name in the list doesn't exist?
Pandas raises a KeyError specifying which label is missing. The validation occurs in pandas/core/indexing.py when _LocIndexer attempts to convert your list to an Index object and match it against the DataFrame's column index. To avoid errors when some columns might be absent, first validate your list against df.columns or use df.reindex(columns=your_list) which handles missing labels gracefully by inserting NaN columns.
How does loc handle duplicate column names when using a list?
When your DataFrame contains duplicate column labels and you include that label in your list, loc returns all columns matching that name. As implemented in pandas/core/indexing.py, the indexer identifies all positional locations for each label in your list and includes them in the result. If you need to select only specific occurrences of duplicate columns, you must use positional indexing with iloc or deduplicate the columns first using df.loc[:, ~df.columns.duplicated()].
Is using loc with a list faster than double bracket notation?
Performance is comparable for small DataFrames, but df.loc[:, ['A', 'B']] is generally preferred over df[['A', 'B']] for clarity and consistency. The loc approach explicitly indicates label-based selection and handles chained assignment warnings better. Internally, both methods eventually route through similar indexing logic in pandas/core/indexing.py, though loc provides more predictable behavior with mixed data types and duplicate labels.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →