# How to Filter for Distinct Values in a Pandas DataFrame Using the `unique` Function

> Learn to filter distinct values in a Pandas DataFrame with the unique function. Efficiently extract unique entries from columns using isin or boolean indexing for powerful data analysis.

- Repository: [pandas/pandas](https://github.com/pandas-dev/pandas)
- Tags: how-to-guide
- Published: 2026-02-15

---

**Use `df['column'].unique()` to return an array of distinct values from a specific column, then combine the result with `isin()` or boolean indexing to filter the DataFrame.**

The `unique` function provides a fast, vectorized way to extract distinct values from pandas Series objects. According to the pandas-dev/pandas source code, this operation is optimized at the C level and serves as the foundation for filtering distinct values in DataFrame workflows.

## Understanding the `unique` Function Architecture

The `unique` operation in pandas is designed specifically for **Series** objects rather than entire DataFrames. When you call `df['column'].unique()`, pandas delegates the operation through multiple layers of optimized code.

### Series-Level Entry Point

The public API for `unique` is implemented in [`pandas/core/series.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/series.py) at line 2306. The `Series.unique` method extracts the underlying array data and passes it to the core algorithm:

```python
def unique(self) -> ArrayLike:
    return algorithms.unique(self._values)

```

This design ensures that any Series—whether backed by NumPy arrays, ExtensionArrays, or categorical data—can leverage the same uniqueness logic.

### Core Algorithm Implementation

The heavy lifting occurs in [`pandas/core/algorithms.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/algorithms.py) at line 322. The `unique` function detects the input array type and routes to specialized fast paths:

- **NumPy arrays**: Delegates to hashtable-based uniqueness checks
- **ExtensionArrays**: Uses type-specific implementations while preserving order
- **Categorical data**: Leverages category codes for efficiency

The function returns a NumPy array or ExtensionArray containing each distinct value exactly once, preserving the order of first appearance.

## How to Filter for Distinct Values in Practice

The `unique` function serves as the foundation for multiple distinct-value filtering patterns in pandas DataFrames.

### Extracting Distinct Values from a Column

To retrieve distinct values from a specific column, access the column as a Series and call `unique()`:

```python
import pandas as pd

df = pd.DataFrame({
    "city": ["Paris", "Berlin", "Paris", "London", "Berlin"],
    "population": [2_200_000, 3_600_000, 2_200_000, 8_900_000, 3_600_000],
})

# Get distinct city names

distinct_cities = df["city"].unique()
print(distinct_cities)

# Output: ['Paris' 'Berlin' 'London']

```

This returns a NumPy array containing `['Paris', 'Berlin', 'London']` in order of first appearance.

### Filtering Rows Using Distinct Values

Combine `unique()` with `isin()` to filter DataFrame rows based on distinct value membership:

```python

# Filter rows where city is in the distinct set (illustrative pattern)

distinct_cities = df["city"].unique()
filtered_df = df[df["city"].isin(distinct_cities)]

# More practical: Filter using a subset of distinct values

target_cities = df["city"].unique()[:2]  # First two distinct cities

result = df[df["city"].isin(target_cities)]

```

This pattern is essential when you need to validate data against the distinct values present in your dataset.

### Alternative: Using `drop_duplicates` for Distinct Rows

When you need distinct rows rather than just distinct values from a single column, use `drop_duplicates()`, which internally leverages the same uniqueness algorithms:

```python

# Get distinct rows based on the 'city' column

distinct_rows = df.drop_duplicates(subset=["city"], keep="first")

# Equivalent to filtering by unique values and keeping first occurrence

first_occurrence_idx = df.drop_duplicates(subset="city", keep="first").index
df_distinct = df.loc[first_occurrence_idx]

```

As implemented in the pandas source code, `drop_duplicates` uses the same `algorithms.unique` machinery but applies it across multiple columns and row indices.

## Summary

- **`Series.unique`** is the primary method for extracting distinct values from a DataFrame column, implemented in [`pandas/core/series.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/series.py).
- The core algorithm resides in [`pandas/core/algorithms.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/algorithms.py) and handles NumPy arrays, ExtensionArrays, and categorical data through optimized fast paths.
- Use `df['column'].unique()` to return an array of distinct values in order of first appearance.
- Combine `unique()` with `isin()` to filter DataFrame rows based on distinct value membership.
- For distinct rows rather than distinct values, use `drop_duplicates()`, which leverages the same underlying uniqueness algorithms.

## Frequently Asked Questions

### Does `unique` work on DataFrames directly?

No, the `unique` method is defined only for Series objects. To get distinct values from a DataFrame, you must select a specific column using `df['column_name'].unique()`. If you attempt to call `df.unique()` on an entire DataFrame, pandas will raise an `AttributeError`.

### How does `unique` handle NaN values?

The `unique` function treats NaN (Not a Number) values as distinct elements by default. According to the implementation in [`pandas/core/algorithms.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/algorithms.py), NaN values are included in the returned array and are considered unique among themselves. For categorical data, NaN handling depends on whether the category explicitly includes NaN as a valid value.

### What is the difference between `unique` and `drop_duplicates`?

`unique` operates on a single Series and returns an array of distinct values, while `drop_duplicates` operates on DataFrames and returns a subset of rows. The `unique` method is located in [`pandas/core/series.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/series.py) and returns values in order of first appearance as a NumPy array. In contrast, `drop_duplicates` is a DataFrame method that can consider multiple columns and returns a DataFrame with duplicate rows removed, keeping the first or last occurrence based on the `keep` parameter.

### Is `unique` faster than converting to a set?

Yes, `Series.unique()` is generally faster than converting to a Python `set` because it uses optimized C-level hashtable operations through `pandas.core.algorithms.unique`. The pandas implementation preserves the order of first appearance and handles pandas-specific data types (like Categorical, Datetime, and nullable integers) more efficiently than the generic Python `set` conversion, which requires casting and loses ordering guarantees.