how-to-guide

How to Filter a Pandas DataFrame Using IN and NOT IN Like SQL WHERE

February 14, 2026 pandas-dev/pandas ↗

Use the isin() method to test for membership and the bitwise NOT operator ~ for negation, enabling SQL-style WHERE col IN (...) and WHERE col NOT IN (...) filtering in pandas.

Filtering rows based on whether column values exist in a specific set is one of the most common SQL operations. In the pandas-dev/pandas repository, this functionality is implemented through the isin() method, which provides a vectorized, high-performance way to filter a pandas DataFrame using IN and NOT IN like SQL WHERE clauses.

Understanding the `isin()` Method for SQL-Style Filtering

The pandas library implements SQL-equivalent IN operators through two primary entry points: Series.isin() for single-column checks and DataFrame.isin() for multi-column or element-wise comparisons.

Series.isin() for Column-Wise Membership Testing

When you need to check if values in a single column exist within a specified set, Series.isin() returns a Boolean Series that serves as a filter mask. According to the pandas-dev/pandas source code, this method is implemented in pandas/core/series.py at line 6114.

The method accepts various collection types—lists, sets, dictionaries, or even another pandas Series—as the values argument, making it flexible for different data workflows.

DataFrame.isin() for Multi-Column Filtering

For checking membership across multiple columns simultaneously, DataFrame.isin() creates a Boolean DataFrame of the same shape, where each cell indicates whether that specific element exists in the provided values. This method is defined in pandas/core/frame.py at line 18326.

Unlike the Series version, DataFrame.isin() is typically used when you want to filter based on exact row matches against a reference table or when performing element-wise membership testing across the entire DataFrame.

The Core Algorithm Behind the Scenes

Both Series.isin() and DataFrame.isin() delegate to the low-level, vectorized algorithm located in pandas/core/algorithms.py at line 493. This isin(comps, values) function performs the actual membership testing on underlying NumPy arrays or ExtensionArrays, ensuring consistent performance across data types including nullable integers, strings, and categoricals.

How to Implement NOT IN in Pandas

SQL's NOT IN operator is expressed in pandas through logical negation of the Boolean mask generated by isin(). The bitwise NOT operator ~ inverts the True/False values, effectively converting an "in" check to a "not in" check.


# SQL equivalent: WHERE column NOT IN ('value1', 'value2')

mask = ~df['column'].isin(['value1', 'value2'])
filtered_df = df[mask]

This pattern works identically for both Series and DataFrame objects, maintaining consistency across the pandas API.

Practical Examples: SQL WHERE IN and NOT IN in Pandas

Filtering Rows with IN Condition

To replicate SELECT * FROM table WHERE city IN ('Paris', 'Berlin'), use Series.isin() to generate a filter mask:

import pandas as pd

df = pd.DataFrame({
    "city": ["New York", "Paris", "Tokyo", "Berlin"],
    "population": [8_400_000, 2_200_000, 9_300_000, 3_600_000]
})

# SQL: WHERE city IN ('Paris', 'Berlin')

mask = df["city"].isin(["Paris", "Berlin"])
result = df[mask]
print(result)

Output:


    city  population
1  Paris    2200000
3 Berlin    3600000

Excluding Rows with NOT IN Condition

To exclude specific values using SQL's NOT IN logic, apply the ~ operator to invert the Boolean mask:


# SQL: WHERE city NOT IN ('Tokyo')

mask = ~df["city"].isin(["Tokyo"])
result = df[mask]
print(result)

Output:


       city  population
0  New York    8400000
1     Paris    2200000
3    Berlin    3600000

Combining Multiple Conditions

Complex SQL queries with multiple IN conditions and logical operators translate directly to pandas using & (AND) and | (OR):


# SQL: WHERE city IN ('Paris', 'Berlin') AND population > 2_500_000

mask = df["city"].isin(["Paris", "Berlin"]) & (df["population"] > 2_500_000)
result = df[mask]
print(result)

Output:


    city  population
3 Berlin    3600000

Using a DataFrame as the Lookup Table

The DataFrame.isin() method allows you to filter based on exact row matches against another DataFrame, similar to SQL's WHERE (col1, col2) IN (SELECT ...):

allowed = pd.DataFrame({
    "city": ["New York", "Tokyo"],
    "population": [8_400_000, 9_300_000]
})

# Keep rows that appear exactly in `allowed` (both columns must match)

mask = df.isin(allowed)
result = df[mask.all(axis=1)]
print(result)

Output:


       city  population
0  New York    8400000
2     Tokyo    9300000

Summary

Use Series.isin() (implemented in pandas/core/series.py) to test membership in a single column, returning a Boolean mask for filtering.
Use DataFrame.isin() (implemented in pandas/core/frame.py) to perform element-wise membership testing across multiple columns.
Apply the ~ operator to invert isin() results, achieving SQL-style NOT IN functionality.
Leverage the core algorithm in pandas/core/algorithms.py for vectorized, high-performance membership testing across all pandas data types.
Combine masks using & (AND) and | (OR) to replicate complex SQL WHERE clauses with multiple IN conditions.

Frequently Asked Questions

How do I filter a pandas DataFrame using a list of values like SQL IN?

Use the isin() method on a Series to create a Boolean mask, then pass that mask to the DataFrame indexer. For example: df[df['column'].isin(['value1', 'value2'])]. This pattern, implemented in pandas/core/series.py, is the direct equivalent of SQL's WHERE column IN (...).

What is the equivalent of SQL NOT IN in pandas?

The equivalent of SQL NOT IN is the logical negation of the isin() mask using the bitwise NOT operator ~. The syntax is df[~df['column'].isin(values)], which inverts the Boolean mask to exclude matching rows rather than include them.

Can I use isin() with multiple columns in pandas?

Yes, DataFrame.isin() operates element-wise across all columns, returning a Boolean DataFrame of the same shape. To filter rows where all columns match values in a reference set, use df[df.isin(values).all(axis=1)]. For column-specific logic, combine individual Series.isin() calls with the & operator.

How does pandas isin() handle null values compared to SQL?

Unlike SQL where NULL IN (NULL) evaluates to unknown (false in practice), pandas isin() treats NaN or None as distinct values. By default, NaN is not considered equal to NaN in membership tests. To include missing values in your filter, you must explicitly check for nulls using pd.isna() and combine it with your isin() mask using the | operator.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how pandas-dev/pandas works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →

How to Filter a Pandas DataFrame Using IN and NOT IN Like SQL WHERE

Understanding the isin() Method for SQL-Style Filtering

Series.isin() for Column-Wise Membership Testing

DataFrame.isin() for Multi-Column Filtering

The Core Algorithm Behind the Scenes

How to Implement NOT IN in Pandas

Practical Examples: SQL WHERE IN and NOT IN in Pandas

Filtering Rows with IN Condition

Excluding Rows with NOT IN Condition

Combining Multiple Conditions

Using a DataFrame as the Lookup Table

Summary

Frequently Asked Questions

How do I filter a pandas DataFrame using a list of values like SQL IN?

What is the equivalent of SQL NOT IN in pandas?

Can I use isin() with multiple columns in pandas?

How does pandas isin() handle null values compared to SQL?

Have a question about this repo?

Understanding the `isin()` Method for SQL-Style Filtering