# How to Find Unique Values in a Pandas DataFrame Column Using `Series.unique()`

> Learn how to find unique values in a Pandas DataFrame column. The Series.unique() method efficiently returns distinct values while preserving data types and handling missing data.

- Repository: [pandas/pandas](https://github.com/pandas-dev/pandas)
- Tags: how-to-guide
- Published: 2026-02-20

---

**`Series.unique()` returns an array of distinct values from a pandas DataFrame column by dispatching to optimized C-level NumPy routines while preserving data types and handling missing values.**

The `pandas unique` method provides the most efficient way to extract distinct values from a single column in a DataFrame. As implemented in the `pandas-dev/pandas` repository, this approach leverages highly optimized underlying algorithms to deliver both speed and memory efficiency when working with datasets containing millions of rows.

## Understanding the `unique` Method in Pandas

The `unique()` method is available on pandas `Series` objects, which represent individual columns of a DataFrame. When you access a column using `df['column_name']` and call `.unique()`, you invoke a specialized routine that returns the distinct values while maintaining the original data type integrity.

Unlike Python's built-in `set()` operation, which requires hashable types and loses ordering metadata, `Series.unique()` preserves the order of first appearance and handles complex pandas dtypes including `Categorical`, `datetime64`, and nullable integer types.

## How `Series.unique()` Works Internally

### Dispatch to the Underlying Array

In [`pandas/core/series.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/series.py) at line 2316, the `Series.unique()` method acts primarily as a dispatcher. The implementation forwards the call directly to the underlying array's `unique()` method:

```python

# Conceptual flow from pandas/core/series.py

def unique(self):
    return self.array.unique()

```

This delegation pattern allows pandas to support diverse data types through the `ExtensionArray` interface while maintaining a consistent API for users.

### Core Algorithm in [`pandas/core/algorithms.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/algorithms.py)

For standard NumPy-backed data, the array's `unique()` implementation invokes the core algorithm located in [`pandas/core/algorithms.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/algorithms.py) (around line 320). This routine performs several critical steps:

1. **Input normalization**: Handles missing values (NaN) and ensures contiguous memory layout
2. **C-level computation**: Calls `np.unique` on the underlying NumPy values, leveraging highly optimized C routines
3. **Type restoration**: Preserves original metadata for categorical, datetime, or sparse dtypes
4. **Result construction**: Returns a new array containing only distinct elements

Because the heavy computation occurs at the C level through NumPy, the operation avoids Python-level loops and remains memory-efficient even for millions of rows.

### Extension Array Support

Specialized data types implement their own `unique()` methods to handle type-specific logic:

- **`CategoricalArray`** in [`pandas/core/arrays/categorical.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/arrays/categorical.py) (line 2555): Preserves the category ordering and returns only categories present in the data
- **`SparseArray`** in [`pandas/core/arrays/sparse/array.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/arrays/sparse/array.py) (line 921): Optimizes uniqueness checks for sparse data structures

All implementations adhere to the same contract: return a one-dimensional array of distinct values without duplicates.

## Practical Examples of Using `pandas unique`

### Basic Numeric Column with Missing Values

When working with real-world data containing null values, `unique()` handles NaN appropriately:

```python
import pandas as pd
import numpy as np

df = pd.DataFrame({"values": [1, 2, 2, 3, 1, np.nan, 4, np.nan]})
unique_vals = df["values"].unique()
print(unique_vals)

# Output: array([ 1.,  2.,  3., nan,  4.])

```

Notice that NaN values are included in the result (as they represent distinct missing data points), and the original float dtype is preserved.

### Preserving Categories in Categorical Data

For categorical columns, `unique()` maintains the categorical dtype and returns only the categories actually present:

```python
cat_series = pd.Series(pd.Categorical(["apple", "banana", "apple", "cherry"]))
print(cat_series.unique())

# Output: ['apple', 'banana', 'cherry']

# dtype: category

```

This behavior differs from converting to a set or using NumPy directly, as it preserves the categorical metadata and ordering.

### Performance on Large Datasets

The C-level optimization becomes apparent when processing millions of rows:

```python
big_df = pd.DataFrame({"id": np.random.randint(0, 1_000_000, size=10_000_000)})
%timeit big_df["id"].unique()

# Typical output (on a modern CPU):

# 1 loop, best of 5: 120 ms per loop

```

This demonstrates that `pandas unique` operations remain performant even on DataFrames containing ten million rows, completing in approximately 120 milliseconds.

## Summary

- **`Series.unique()`** provides the most efficient method to extract distinct values from a pandas DataFrame column as implemented in the `pandas-dev/pandas` repository.
- The method delegates to underlying array implementations in [`pandas/core/series.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/series.py), with core logic residing in [`pandas/core/algorithms.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/algorithms.py) that leverages optimized C-level NumPy routines.
- **Extension arrays** like `CategoricalArray` and `SparseArray` provide specialized implementations that preserve type-specific metadata while maintaining the same public contract.
- The operation handles **missing values** (NaN) appropriately, preserves **original data types**, and maintains **memory efficiency** even for datasets containing millions of rows.

## Frequently Asked Questions

### What is the difference between `unique()` and `drop_duplicates()` in pandas?

`Series.unique()` returns a NumPy array or ExtensionArray containing only the distinct values from the column, while `DataFrame.drop_duplicates()` returns a DataFrame with duplicate rows removed. Additionally, `unique()` operates on a single Series and returns an array, whereas `drop_duplicates()` works on DataFrame rows and maintains the tabular structure.

### Does `unique()` preserve the original order of values?

Yes, `Series.unique()` preserves the order of first appearance. When the underlying algorithm in [`pandas/core/algorithms.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/algorithms.py) processes the data, it maintains the sequence in which unique values initially appear in the column, unlike Python's `set()` which returns values in arbitrary order.

### How does `unique()` handle NaN values?

`Series.unique()` treats NaN (Not a Number) values as distinct elements and includes them in the returned array. According to the implementation in [`pandas/core/algorithms.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/algorithms.py), missing values are normalized but preserved in the output, allowing you to identify the presence of null data alongside actual values.

### Can I use `unique()` on multiple columns simultaneously?

No, `Series.unique()` is designed to operate on a single column (Series) only. To find unique combinations across multiple columns, use `DataFrame.drop_duplicates()` or combine columns into a single Series (e.g., using `df[['col1', 'col2']].apply(tuple, axis=1).unique()`), though the latter approach is less efficient for large datasets.