How to Get Columns from a pandas DataFrame: Extract Header Lists
Use df.columns.tolist() to convert a DataFrame's column Index into a plain Python list of header strings.
The pandas-dev/pandas library structures DataFrame metadata through an inherited axis system where column labels are stored as a specialized Index object. When you need to pandas get columns as a standard list for validation, API calls, or iteration, you must extract and convert this Index using specific accessors defined in the core source files.
Accessing Column Headers via the Info Axis
In pandas source architecture, DataFrame inherits from the NDFrame base class defined in pandas/core/generic.py. This base class implements an "info axis" pattern that manages structural labels—columns for DataFrames and the index for Series.
The DataFrame class in pandas/core/frame.py explicitly sets the class variable _info_axis_name = "columns" (line 18462). The generic _info_axis property in pandas/core/generic.py (lines 603-606) returns getattr(self, self._info_axis_name), which means df.columns is essentially a public alias for this internal info axis.
Because df.columns returns an Index object (from pandas/core/indexes/base.py) rather than a simple list, it carries pandas-specific indexing capabilities while requiring conversion for standard Python operations.
Converting the Column Index to a List
While df.columns provides the header labels, it returns them as an Index object that displays as Index(['A', 'B'], dtype='object'). To extract a native Python list, call the tolist() method implemented in pandas/core/indexes/base.py.
This method efficiently converts the underlying array data into a standard list of strings. The axes property in pandas/core/frame.py (lines 788-889) confirms that df.columns returns the column Index, distinct from df.index which returns the row index.
Practical Code Examples
Basic DataFrame Column Extraction
import pandas as pd
df = pd.DataFrame({
"city": ["Paris", "Berlin", "Tokyo"],
"population": [2.1, 3.6, 9.3],
"country": ["France", "Germany", "Japan"]
})
# Get list of column headers
headers = df.columns.tolist()
print(headers)
# Output: ['city', 'population', 'country']
Reading CSV Files and Listing Columns
import pandas as pd
# Load data from CSV
df = pd.read_csv("sales.csv")
# Extract column names immediately after loading
column_headers = df.columns.tolist()
print(f"Available columns: {column_headers}")
Accessing the Underlying Info Axis (Advanced)
For introspection or debugging, you can access the internal _info_axis attribute directly, though df.columns remains the public API:
import pandas as pd
df = pd.DataFrame({"x": [0], "y": [1], "z": [2]})
# Access internal info axis (returns same Index as df.columns)
info_axis = df._info_axis
print(info_axis.tolist())
# Output: ['x', 'y', 'z']
How Column Storage Works in pandas Source Code
The behavior of df.columns.tolist() is determined by three key files in the pandas repository:
pandas/core/generic.py: Defines theNDFramebase class containing the_info_axisproperty (lines 603-606) that dynamically retrieves the axis specified by_info_axis_name.pandas/core/frame.py: TheDataFrameimplementation sets_info_axis_name = "columns"and exposes the column Index through theaxesproperty (lines 788-889), which returns[self.index, self.columns].pandas/core/indexes/base.py: Implements theIndexbase class with thetolist()method that converts index labels to plain Python lists.
Summary
df.columnsaccesses the column Index object stored as the DataFrame's info axis.df.columns.tolist()converts the Index to a standard Python list of header strings.- The column storage mechanism inherits from
NDFrameinpandas/core/generic.py, with DataFrame-specific configuration inpandas/core/frame.py. - The
tolist()method is defined inpandas/core/indexes/base.pyand works on any Index instance, including MultiIndex columns.
Frequently Asked Questions
What is the difference between df.columns and df.columns.tolist()?
df.columns returns an Index object (a pandas array with metadata), while df.columns.tolist() returns a plain Python list containing only the label values. Use the former for pandas operations like label-based selection, and the latter when you need a native list for Python standard library functions or external APIs.
Can I get column headers as a list without using tolist()?
Yes, you can use list(df.columns), which iterates through the Index, but df.columns.tolist() is the idiomatic approach explicitly implemented in pandas/core/indexes/base.py for optimal performance. The tolist() method directly accesses the underlying ndarray data, making it slightly faster than the generic list() constructor.
Why does df.columns return an Index instead of a list?
pandas uses Index objects because they support label-based alignment, boolean indexing, and hierarchical operations (MultiIndex) that plain lists cannot provide. This design choice, implemented in the NDFrame architecture of pandas/core/generic.py, allows column headers to participate in data alignment during joins, merges, and concatenation operations.
How do I get column names when reading from a CSV file?
After calling pd.read_csv(), immediately call df.columns.tolist() on the returned DataFrame. The CSV parser extracts headers from the first row (or specified header row) and stores them in the column Index before returning the DataFrame, making the headers available instantly without additional processing.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →