How to Use pandas str.contains to Check for Multiple Expressions in a DataFrame

Use the regex alternation operator | to combine multiple patterns into a single string (e.g., r"apple|banana|cherry"), then pass this to Series.str.contains() with case=False for case-insensitive matching and na=False to handle missing values.

The pandas library provides a vectorized string accessor that enables efficient pattern matching across DataFrame columns without explicit Python loops. According to the pandas-dev/pandas source code, the str.contains method implemented in [pandas/core/strings/accessor.py](https://github.com/pandas-dev/pandas/blob/main/pandas/core/strings/accessor.py#L1363) accepts regular expressions by default, allowing you to test for multiple expressions simultaneously using standard regex syntax.

How str.contains Handles Pattern Matching

The contains method accepts a regular expression pattern as its primary argument along with several control parameters. When regex=True (the default), pandas forwards the pattern to Python's re module, enabling full regex support including the alternation operator |. This architecture allows a single method call to evaluate multiple sub-patterns across millions of rows in a vectorized operation.

Key parameters for multi-expression checks include:

  • case – Set to False for case-insensitive matching.
  • na – Controls behavior for missing values (typically False to treat NaN as non-matching).
  • regex – When True (default), interprets the pattern as a regular expression rather than a literal string.

Using Regex Alternation for Multiple Keywords

To check for any of several expressions in a single column, join the patterns with the pipe character |. This creates a regex alternation that matches the first instance of any listed substring.

For example, the pattern r"error|failed|warning" returns True for any row containing "error", "failed", or "warning". This approach executes in a single pass through the data, maintaining the performance benefits of pandas' vectorized operations rather than iterating through multiple separate checks.

Practical Examples for Multiple Expression Matching

Filtering a Single Column for Multiple Keywords

This example searches a text column for any of three fruit names, ignoring case and handling missing values:

import pandas as pd

df = pd.DataFrame({
    "text": [
        "I love apples",
        "Bananas are great",
        "Cherries are red",
        "No fruit here",
    ]
})

# Look for any of the three fruit names (case-insensitive)

pat = r"apple|banana|cherry"
mask = df["text"].str.contains(pat, case=False, na=False)
result = df[mask]

print(result)

Output:


               text
0   I love apples
1  Bananas are great
2   Cherries are red

Source: The contains method used here is defined in [pandas/core/strings/accessor.py](https://github.com/pandas-dev/pandas/blob/main/pandas/core/strings/accessor.py#L1363).

Searching Across Multiple DataFrame Columns

To check if any column in a row contains multiple target expressions, combine apply() with any(axis=1):

df = pd.DataFrame({
    "col1": ["error 404", "success", "warning"],
    "col2": ["failed login", "user active", "error 500"],
    "col3": ["OK", "error 403", "pending"]
})

# Any column containing 'error' or 'failed'

pat = r"error|failed"
mask = df.apply(lambda s: s.astype(str).str.contains(pat, case=False, na=False)).any(axis=1)
result = df[mask]

print(result)

Output:


          col1          col2    col3
0    error 404  failed login      OK
2      warning      error 500  pending

This approach applies the string accessor to each column individually, then uses .any(axis=1) to return rows where at least one column matches.

Optimizing Performance with Compiled Regex

When reusing the same multi-expression pattern repeatedly, compile it first to avoid recompilation overhead:

import re
import pandas as pd

df = pd.DataFrame({
    "msg": [
        "User admin logged in",
        "User guest failed login",
        "System rebooted",
        "admin privileges escalated"
    ]
})

# Compile once, reuse many times

regex = re.compile(r"\badmin\b|\bfailed\b", flags=re.IGNORECASE)

mask = df["msg"].str.contains(regex, regex=True, na=False)
print(df[mask])

Output:


                                 msg
0          User admin logged in
1       User guest failed login
3  admin privileges escalated

The contains method accepts compiled pattern objects because it forwards them directly to Python's re.search function.

Combining with Aggregation Operations

After filtering with multiple patterns, you can chain additional string operations:


# Filter rows containing either pattern

filtered = df[df["msg"].str.contains(r"admin|failed", case=False, na=False)]

# Count occurrences of the patterns across remaining rows

counts = filtered.apply(lambda s: s.str.count(r"admin|failed", flags=re.IGNORECASE)).sum()
print(counts)

Summary

  • Combine multiple search terms using the regex alternation operator | (pipe) to create a single pattern string like r"cat|dog|mouse".
  • The str.contains method in pandas/core/strings/accessor.py processes these patterns vectorized via Python's re module when regex=True.
  • Set case=False for case-insensitive matching and na=False to handle missing values without introducing NaN into boolean masks.
  • For DataFrame-wide searches, use apply() with str.contains followed by .any(axis=1) or .all(axis=1) to check across columns.
  • Compile patterns with re.compile() before passing to str.contains when reusing complex multi-expression patterns in performance-critical loops.

Frequently Asked Questions

How do I perform a case-insensitive search for multiple strings?

Pass your alternation pattern to str.contains with the parameter case=False. This setting ignores uppercase and lowercase distinctions, so r"error|failed" matches "ERROR", "Failed", or "error" equally.

How should I handle NaN values when filtering with multiple patterns?

Use the na parameter to specify the fill value for missing data, typically na=False to treat NaNs as non-matches. This prevents the method from returning NaN for missing values, which would otherwise propagate into boolean mask operations and produce unexpected filtering results.

Can I search multiple DataFrame columns at once with str.contains?

Apply the string accessor to each column using df.apply(lambda s: s.astype(str).str.contains(pattern)), then chain .any(axis=1) to return rows where any column matches. Because str.contains operates on Series objects, you must apply it column-wise when working with entire DataFrames.

Is compiling the regex pattern beneficial for performance?

Yes, when reusing the same multi-expression pattern across multiple calls, compile it first with re.compile(r"pattern1|pattern2", flags=re.IGNORECASE) and pass the compiled object to str.contains. The pandas source code forwards compiled patterns directly to Python's re module, avoiding recompilation overhead on each invocation.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s https://instagit.com/install.md

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client