How to Filter a Pandas DataFrame Using IN and NOT IN Like SQL WHERE
Use the isin() method to test for membership and the bitwise NOT operator ~ for negation, enabling SQL-style WHERE col IN (...) and WHERE col NOT IN (...) filtering in pandas.
Filtering rows based on whether column values exist in a specific set is one of the most common SQL operations. In the pandas-dev/pandas repository, this functionality is implemented through the isin() method, which provides a vectorized, high-performance way to filter a pandas DataFrame using IN and NOT IN like SQL WHERE clauses.
Understanding the isin() Method for SQL-Style Filtering
The pandas library implements SQL-equivalent IN operators through two primary entry points: Series.isin() for single-column checks and DataFrame.isin() for multi-column or element-wise comparisons.
Series.isin() for Column-Wise Membership Testing
When you need to check if values in a single column exist within a specified set, Series.isin() returns a Boolean Series that serves as a filter mask. According to the pandas-dev/pandas source code, this method is implemented in pandas/core/series.py at line 6114.
The method accepts various collection types—lists, sets, dictionaries, or even another pandas Series—as the values argument, making it flexible for different data workflows.
DataFrame.isin() for Multi-Column Filtering
For checking membership across multiple columns simultaneously, DataFrame.isin() creates a Boolean DataFrame of the same shape, where each cell indicates whether that specific element exists in the provided values. This method is defined in pandas/core/frame.py at line 18326.
Unlike the Series version, DataFrame.isin() is typically used when you want to filter based on exact row matches against a reference table or when performing element-wise membership testing across the entire DataFrame.
The Core Algorithm Behind the Scenes
Both Series.isin() and DataFrame.isin() delegate to the low-level, vectorized algorithm located in pandas/core/algorithms.py at line 493. This isin(comps, values) function performs the actual membership testing on underlying NumPy arrays or ExtensionArrays, ensuring consistent performance across data types including nullable integers, strings, and categoricals.
How to Implement NOT IN in Pandas
SQL's NOT IN operator is expressed in pandas through logical negation of the Boolean mask generated by isin(). The bitwise NOT operator ~ inverts the True/False values, effectively converting an "in" check to a "not in" check.
# SQL equivalent: WHERE column NOT IN ('value1', 'value2')
mask = ~df['column'].isin(['value1', 'value2'])
filtered_df = df[mask]
This pattern works identically for both Series and DataFrame objects, maintaining consistency across the pandas API.
Practical Examples: SQL WHERE IN and NOT IN in Pandas
Filtering Rows with IN Condition
To replicate SELECT * FROM table WHERE city IN ('Paris', 'Berlin'), use Series.isin() to generate a filter mask:
import pandas as pd
df = pd.DataFrame({
"city": ["New York", "Paris", "Tokyo", "Berlin"],
"population": [8_400_000, 2_200_000, 9_300_000, 3_600_000]
})
# SQL: WHERE city IN ('Paris', 'Berlin')
mask = df["city"].isin(["Paris", "Berlin"])
result = df[mask]
print(result)
Output:
city population
1 Paris 2200000
3 Berlin 3600000
Excluding Rows with NOT IN Condition
To exclude specific values using SQL's NOT IN logic, apply the ~ operator to invert the Boolean mask:
# SQL: WHERE city NOT IN ('Tokyo')
mask = ~df["city"].isin(["Tokyo"])
result = df[mask]
print(result)
Output:
city population
0 New York 8400000
1 Paris 2200000
3 Berlin 3600000
Combining Multiple Conditions
Complex SQL queries with multiple IN conditions and logical operators translate directly to pandas using & (AND) and | (OR):
# SQL: WHERE city IN ('Paris', 'Berlin') AND population > 2_500_000
mask = df["city"].isin(["Paris", "Berlin"]) & (df["population"] > 2_500_000)
result = df[mask]
print(result)
Output:
city population
3 Berlin 3600000
Using a DataFrame as the Lookup Table
The DataFrame.isin() method allows you to filter based on exact row matches against another DataFrame, similar to SQL's WHERE (col1, col2) IN (SELECT ...):
allowed = pd.DataFrame({
"city": ["New York", "Tokyo"],
"population": [8_400_000, 9_300_000]
})
# Keep rows that appear exactly in `allowed` (both columns must match)
mask = df.isin(allowed)
result = df[mask.all(axis=1)]
print(result)
Output:
city population
0 New York 8400000
2 Tokyo 9300000
Summary
- Use
Series.isin()(implemented inpandas/core/series.py) to test membership in a single column, returning a Boolean mask for filtering. - Use
DataFrame.isin()(implemented inpandas/core/frame.py) to perform element-wise membership testing across multiple columns. - Apply the
~operator to invertisin()results, achieving SQL-styleNOT INfunctionality. - Leverage the core algorithm in
pandas/core/algorithms.pyfor vectorized, high-performance membership testing across all pandas data types. - Combine masks using
&(AND) and|(OR) to replicate complex SQLWHEREclauses with multipleINconditions.
Frequently Asked Questions
How do I filter a pandas DataFrame using a list of values like SQL IN?
Use the isin() method on a Series to create a Boolean mask, then pass that mask to the DataFrame indexer. For example: df[df['column'].isin(['value1', 'value2'])]. This pattern, implemented in pandas/core/series.py, is the direct equivalent of SQL's WHERE column IN (...).
What is the equivalent of SQL NOT IN in pandas?
The equivalent of SQL NOT IN is the logical negation of the isin() mask using the bitwise NOT operator ~. The syntax is df[~df['column'].isin(values)], which inverts the Boolean mask to exclude matching rows rather than include them.
Can I use isin() with multiple columns in pandas?
Yes, DataFrame.isin() operates element-wise across all columns, returning a Boolean DataFrame of the same shape. To filter rows where all columns match values in a reference set, use df[df.isin(values).all(axis=1)]. For column-specific logic, combine individual Series.isin() calls with the & operator.
How does pandas isin() handle null values compared to SQL?
Unlike SQL where NULL IN (NULL) evaluates to unknown (false in practice), pandas isin() treats NaN or None as distinct values. By default, NaN is not considered equal to NaN in membership tests. To include missing values in your filter, you must explicitly check for nulls using pd.isna() and combine it with your isin() mask using the | operator.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →