# How to Drop Column by Name in pandas When Column Names Contain a Substring

> Easily drop pandas columns by name containing a substring using efficient vectorized boolean indexing or optimized Index drop routine. Learn time saving techniques.

- Repository: [pandas/pandas](https://github.com/pandas-dev/pandas)
- Tags: how-to-guide
- Published: 2026-02-21

---

**Use `df.loc[:, ~df.columns.str.contains('pattern')]` for vectorized boolean indexing or `df.drop([c for c in df.columns if 'pattern' in c], axis=1)` to leverage pandas' optimized `Index.drop` routine—both approaches achieve O(k) complexity where k is the number of columns removed, not the total column count.**

When working with wide datasets in the `pandas-dev/pandas` repository, you often need to **drop column by name in pandas** based on partial string matches rather than exact label matches. The library stores column labels in a specialized `Index` object that supports fast, vectorized operations, allowing you to remove columns containing specific substrings without Python-level iteration over rows.

## Understanding the Internal Column Drop Architecture

The efficiency of column removal in pandas stems from how the library handles index manipulation internally. When you call `DataFrame.drop`, the method signature defined in [`pandas/core/frame.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py) forwards the request to `NDFrame.drop` in [`pandas/core/generic.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/generic.py). This generic implementation resolves the labels, normalizes the axis argument, and invokes the underlying `Index.drop` method found in [`pandas/core/indexes/base.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/indexes/base.py).

According to the pandas source code, `Index.drop` creates a new `Index` instance without the specified labels using a C-level fast-path that operates in **O(k) time complexity**, where *k* represents the number of columns being removed. This means the performance cost scales with the size of your exclusion list, not the total number of columns in the DataFrame.

## Method 1: Vectorized Boolean Masking with str.contains

The most direct way to drop column by name in pandas using substring matching employs vectorized string operations on the column Index itself. The `str.contains` method builds a boolean array in a single pass, and the tilde operator (`~`) inverts the mask to select only columns that do *not* match the pattern.

```python
import pandas as pd

# Sample DataFrame with substring patterns in column names

df = pd.DataFrame({
    'apple_qty': [10, 20],
    'banana_qty': [5, 15],
    'apple_price': [1.2, 1.3],
    'banana_price': [0.8, 0.9],
    'misc': [0, 1]
})

# Drop every column containing the substring 'price'

df_filtered = df.loc[:, ~df.columns.str.contains('price')]
print(df_filtered)

```

This approach avoids creating intermediate Python lists and performs the selection in a single vectorized pass through the column Index.

## Method 2: Pre-computing Labels for DataFrame.drop

Alternatively, you can compute the list of columns to remove using a list comprehension, then pass that list to `DataFrame.drop`. While the list comprehension executes in Python, it runs only once, and the subsequent `drop` call leverages the optimized `Index.drop` routine implemented in [`pandas/core/indexes/base.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/indexes/base.py).

```python

# Compute columns to drop using substring matching

cols_to_drop = [col for col in df.columns if 'price' in col]

# Use drop with axis=1 for column removal

df_dropped = df.drop(cols_to_drop, axis=1)
print(df_dropped)

```

As implemented in `pandas-dev/pandas`, this method is equally efficient to boolean masking because the O(k) index manipulation dominates the runtime, overshadowing the one-time cost of the list comprehension.

## Method 3: Regex Filtering with DataFrame.filter

For advanced pattern matching, you can use `DataFrame.filter` with a regular expression that excludes matching columns. This method returns a new DataFrame containing only columns whose names match the regex, effectively dropping everything else.

```python

# Keep only columns that do NOT end with 'price' using negative lookahead

df_regex = df.filter(regex='^(?!.*price$)')
print(df_regex)

```

All three approaches produce identical output:

```

   apple_qty  banana_qty  misc
0         10           5     0
1         20          15     1

```

## Performance Considerations

Both the **boolean masking** and **pre-computed drop** methods avoid row-wise iteration and keep operations column-wise, which is the most performant way to reshape a DataFrame. The choice between them depends on your specific workflow:

- **Boolean masking** (`df.loc[:, ~mask]`) is slightly more concise and avoids the overhead of method dispatch within `drop()`.
- **Pre-computed drop** (`df.drop(list, axis=1)`) is explicit and useful when you need the list of dropped columns for logging or further processing.

According to the source code in [`pandas/core/generic.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/generic.py), the `drop` method includes additional validation logic for the `labels` parameter, making the `loc` approach marginally faster for simple filtering tasks, though both scale linearly with the number of columns removed.

## Summary

- **Boolean masking** with `df.loc[:, ~df.columns.str.contains()]` provides the most concise syntax for dropping columns by substring.
- **Pre-computing label lists** for `df.drop()` leverages the O(k) `Index.drop` implementation in [`pandas/core/indexes/base.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/indexes/base.py) and works well when you need to reference the exclusion list later.
- **Regex filtering** via `df.filter()` offers advanced pattern matching capabilities for complex naming conventions.
- All approaches maintain column-wise operations, avoiding the performance penalty of row-wise iteration.

## Frequently Asked Questions

### Which is faster: drop() or loc[] with a boolean mask?

Both methods exhibit O(k) complexity where *k* is the number of columns removed. However, `df.loc[]` with a boolean mask is marginally faster because it bypasses the label validation and error handling logic found in [`pandas/core/generic.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/generic.py)'s `drop` implementation. For most datasets, the difference is negligible, but `loc` is preferred for simple filtering while `drop` is better when you need explicit control over the exclusion list.

### Can I use regular expressions directly with the drop() method?

No, `DataFrame.drop()` requires an exact list of label names as implemented in [`pandas/core/frame.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py). To use regex patterns, you must first identify matching columns using `df.columns.str.contains()` with `regex=True`, or use `df.filter()` with an appropriate regex pattern to select the columns you want to keep, effectively dropping the others.

### How do I keep only the columns that contain a specific substring?

Remove the tilde (`~`) operator from the boolean mask to invert the selection: `df.loc[:, df.columns.str.contains('substring')]`. Alternatively, use `df.filter(like='substring')`, which is a convenience wrapper for partial string matching that returns only matching columns.

### Does this approach work with MultiIndex column names?

Yes, but you must adjust the logic to target specific levels of the MultiIndex. Use `df.columns.get_level_values(level).str.contains()` to check a specific level, then apply the boolean mask across the DataFrame columns. The underlying `Index.drop` mechanism in [`pandas/core/indexes/base.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/indexes/base.py) handles MultiIndex objects with the same O(k) efficiency.