# How to Use Pandas Unique on a Whole DataFrame Based on a Column

> Learn how to use pandas unique on a DataFrame column. Discover `drop_duplicates` for full rows or `Series.unique` for distinct values.

- Repository: [pandas/pandas](https://github.com/pandas-dev/pandas)
- Tags: how-to-guide
- Published: 2026-02-13

---

**Use `DataFrame.drop_duplicates(subset='column')` to retrieve complete rows for each unique value in a specific column, or use `Series.unique()` when you only need the distinct values as a NumPy array.**

The pandas-dev/pandas repository provides optimized, C-level implementations for deduplication workflows. When working with tabular data, applying pandas unique on a whole dataframe based on a column typically requires preserving the full row context associated with each distinct value, which demands a different approach than simple value extraction.

## Choosing Between `Series.unique()` and `DataFrame.drop_duplicates()`

Pandas offers two primary mechanisms for handling uniqueness, each defined in separate core modules:

**`Series.unique()`** (implemented in [`pandas/core/series.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/series.py)) returns a NumPy array containing only the distinct values from a single column. This method discards all other columns and does not preserve the original DataFrame structure.

**`DataFrame.drop_duplicates()`** (implemented in [`pandas/core/frame.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py)) returns a new DataFrame containing the first (or last) occurrence of each unique value in the specified column while retaining all other columns intact. This matches the conventional interpretation of getting unique rows based on a column value.

Both methods utilize vectorized operations that execute in compiled C loops, ensuring high performance even on datasets containing millions of rows.

## Implementation Details in the Source Code

The underlying functionality resides in two critical files within the pandas source tree:

- **[`pandas/core/series.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/series.py)**: Contains the `unique()` method for Series objects, which delegates to hashtable-based algorithms for distinct value extraction.
- **[`pandas/core/frame.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py)**: Houses the `drop_duplicates()` method, which manages row-wise deduplication while supporting complex subset logic and ordering controls.

Additionally, index-level uniqueness operations referenced by both methods are defined in [`pandas/core/indexes/base.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/indexes/base.py).

## Practical Code Examples

### Extracting Unique Values from a Single Column

When you need only the distinct values without row context, access the column as a Series and call `unique()`:

```python
import pandas as pd

df = pd.DataFrame({
    "city": ["Paris", "London", "Paris", "Berlin"],
    "population": [2.1, 8.9, 2.1, 3.6]
})

# Returns NumPy array of unique city names

unique_cities = df["city"].unique()
print(unique_cities)

```

```text
['Paris' 'London' 'Berlin']

```

### Retaining Unique Rows Based on a Column

To perform pandas unique on a whole dataframe based on a column while keeping all associated data, use `drop_duplicates()` with the subset parameter:

```python

# Keep the first occurrence of each unique city

unique_rows = df.drop_duplicates(subset="city")
print(unique_rows)

```

```text
     city  population
0   Paris         2.1
1  London         8.9
3  Berlin         3.6

```

### Preserving the Last Occurrence Instead of the First

Control which duplicate row survives using the `keep` parameter:

```python

# Retain the last row for each unique city

unique_rows_last = df.drop_duplicates(subset="city", keep="last")
print(unique_rows_last)

```

```text
     city  population
2   Paris         2.1
1  London         8.9
3  Berlin         3.6

```

### Sorting Results After Deduplication

Chain methods to deduplicate first, then reorder the results:

```python

# Get unique cities then sort by population descending

unique_sorted = (
    df.drop_duplicates(subset="city")
      .sort_values("population", ascending=False)
)
print(unique_sorted)

```

```text
     city  population
1  London         8.9
3  Berlin         3.6
0   Paris         2.1

```

## Summary

- **`DataFrame.drop_duplicates(subset='col')`** returns complete rows for each unique value in the specified column, making it the correct choice for "unique on a whole DataFrame" operations.
- **`Series.unique()`** extracts only the distinct values as a NumPy array, discarding all other column data.
- Both methods are implemented in C-accelerated code within [`pandas/core/frame.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py) and [`pandas/core/series.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/series.py) respectively.
- The `keep` parameter controls whether to preserve the first, last, or no occurrences of duplicate values.
- These operations return new objects and do not modify the original DataFrame in-place.

## Frequently Asked Questions

### What is the difference between `Series.unique()` and `DataFrame.drop_duplicates()`?

`Series.unique()` returns a NumPy array containing only the distinct values from a single column, while `DataFrame.drop_duplicates()` returns a new DataFrame containing entire rows. Use `unique()` when you need a list of values for lookup or iteration, and use `drop_duplicates()` when you need to preserve the full row context associated with each unique value.

### How do I keep the last duplicate row instead of the first?

Pass `keep='last'` to the `drop_duplicates()` method. By default, pandas retains the first occurrence (`keep='first'`), but setting this parameter to `'last'` ensures the final occurrence of each unique value survives the deduplication process.

### Does `drop_duplicates()` modify the original DataFrame?

No, `drop_duplicates()` returns a new DataFrame and leaves the original unchanged. According to the implementation in [`pandas/core/frame.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py), this method creates a copy of the data with duplicate rows removed based on the specified subset columns.

### Can I get unique rows based on multiple columns?

Yes, pass a list of column names to the `subset` parameter: `df.drop_duplicates(subset=['col1', 'col2'])`. This identifies uniqueness based on the combination of values across all specified columns, returning only the first occurrence of each distinct combination.