How to Get Columns from a pandas DataFrame: Extract Header Lists

Use df.columns.tolist() to convert a DataFrame's column Index into a plain Python list of header strings.

The pandas-dev/pandas library structures DataFrame metadata through an inherited axis system where column labels are stored as a specialized Index object. When you need to pandas get columns as a standard list for validation, API calls, or iteration, you must extract and convert this Index using specific accessors defined in the core source files.

Accessing Column Headers via the Info Axis

In pandas source architecture, DataFrame inherits from the NDFrame base class defined in pandas/core/generic.py. This base class implements an "info axis" pattern that manages structural labels—columns for DataFrames and the index for Series.

The DataFrame class in pandas/core/frame.py explicitly sets the class variable _info_axis_name = "columns" (line 18462). The generic _info_axis property in pandas/core/generic.py (lines 603-606) returns getattr(self, self._info_axis_name), which means df.columns is essentially a public alias for this internal info axis.

Because df.columns returns an Index object (from pandas/core/indexes/base.py) rather than a simple list, it carries pandas-specific indexing capabilities while requiring conversion for standard Python operations.

Converting the Column Index to a List

While df.columns provides the header labels, it returns them as an Index object that displays as Index(['A', 'B'], dtype='object'). To extract a native Python list, call the tolist() method implemented in pandas/core/indexes/base.py.

This method efficiently converts the underlying array data into a standard list of strings. The axes property in pandas/core/frame.py (lines 788-889) confirms that df.columns returns the column Index, distinct from df.index which returns the row index.

Practical Code Examples

Basic DataFrame Column Extraction

import pandas as pd

df = pd.DataFrame({
    "city": ["Paris", "Berlin", "Tokyo"],
    "population": [2.1, 3.6, 9.3],
    "country": ["France", "Germany", "Japan"]
})

# Get list of column headers

headers = df.columns.tolist()
print(headers)

# Output: ['city', 'population', 'country']

Reading CSV Files and Listing Columns

import pandas as pd

# Load data from CSV

df = pd.read_csv("sales.csv")

# Extract column names immediately after loading

column_headers = df.columns.tolist()
print(f"Available columns: {column_headers}")

Accessing the Underlying Info Axis (Advanced)

For introspection or debugging, you can access the internal _info_axis attribute directly, though df.columns remains the public API:

import pandas as pd

df = pd.DataFrame({"x": [0], "y": [1], "z": [2]})

# Access internal info axis (returns same Index as df.columns)

info_axis = df._info_axis
print(info_axis.tolist())

# Output: ['x', 'y', 'z']

How Column Storage Works in pandas Source Code

The behavior of df.columns.tolist() is determined by three key files in the pandas repository:

  • pandas/core/generic.py: Defines the NDFrame base class containing the _info_axis property (lines 603-606) that dynamically retrieves the axis specified by _info_axis_name.
  • pandas/core/frame.py: The DataFrame implementation sets _info_axis_name = "columns" and exposes the column Index through the axes property (lines 788-889), which returns [self.index, self.columns].
  • pandas/core/indexes/base.py: Implements the Index base class with the tolist() method that converts index labels to plain Python lists.

Summary

  • df.columns accesses the column Index object stored as the DataFrame's info axis.
  • df.columns.tolist() converts the Index to a standard Python list of header strings.
  • The column storage mechanism inherits from NDFrame in pandas/core/generic.py, with DataFrame-specific configuration in pandas/core/frame.py.
  • The tolist() method is defined in pandas/core/indexes/base.py and works on any Index instance, including MultiIndex columns.

Frequently Asked Questions

What is the difference between df.columns and df.columns.tolist()?

df.columns returns an Index object (a pandas array with metadata), while df.columns.tolist() returns a plain Python list containing only the label values. Use the former for pandas operations like label-based selection, and the latter when you need a native list for Python standard library functions or external APIs.

Can I get column headers as a list without using tolist()?

Yes, you can use list(df.columns), which iterates through the Index, but df.columns.tolist() is the idiomatic approach explicitly implemented in pandas/core/indexes/base.py for optimal performance. The tolist() method directly accesses the underlying ndarray data, making it slightly faster than the generic list() constructor.

Why does df.columns return an Index instead of a list?

pandas uses Index objects because they support label-based alignment, boolean indexing, and hierarchical operations (MultiIndex) that plain lists cannot provide. This design choice, implemented in the NDFrame architecture of pandas/core/generic.py, allows column headers to participate in data alignment during joins, merges, and concatenation operations.

How do I get column names when reading from a CSV file?

After calling pd.read_csv(), immediately call df.columns.tolist() on the returned DataFrame. The CSV parser extracts headers from the first row (or specified header row) and stores them in the column Index before returning the DataFrame, making the headers available instantly without additional processing.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →