# How to Remove Rows with Duplicate Indices in Pandas DataFrames

> Efficiently remove duplicate indices in pandas DataFrames. Learn to reset, drop duplicates by index, and optionally reset the index for clean data.

- Repository: [pandas/pandas](https://github.com/pandas-dev/pandas)
- Tags: how-to-guide
- Published: 2026-02-19

---

**To remove rows with duplicate indices in pandas, reset the index to a column using `reset_index()`, apply `drop_duplicates(subset="index")` to deduplicate based on that column, and optionally restore the index with `set_index()`.**

The pandas library provides powerful tools for data deduplication, but the `drop_duplicates()` method intentionally ignores index values when identifying duplicate rows according to the source code in [`pandas/core/frame.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py). If you need to remove rows with duplicate indices in a pandas DataFrame, you must explicitly treat the index as a regular column during the deduplication process.

## Why drop_duplicates Ignores the Index by Default

In [`pandas/core/frame.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py) (lines 7681‑7700), the `DataFrame.drop_duplicates` implementation explicitly excludes the index from duplicate detection. The method builds a temporary view of the data that omits the index before applying the duplicate‑mask logic (lines 7679‑7688). This design ensures that row uniqueness is determined solely by column values, making the behavior consistent across different index types including time indexes.

## The Efficient Workflow to Remove Duplicate Index Rows

To efficiently remove rows with duplicate indices, follow this three‑step pattern that leverages pandas’ optimized `drop_duplicates` algorithm while treating the index as a regular column.

### Step 1: Expose the Index as a Column

Use `reset_index()` to move the index into a regular column. By default, this creates a column named `index` (or the index’s name if it has one). This operation does not copy data when possible, making it memory‑efficient.

```python
import pandas as pd

df = pd.DataFrame(
    {"A": [10, 20, 30, 40], "B": [1, 2, 3, 4]},
    index=["x", "y", "x", "z"]
)

df_reset = df.reset_index()

```

### Step 2: Apply drop_duplicates on the Index Column

Call `drop_duplicates()` with the `subset` parameter set to the index column name. This applies the O(N log N) lexicographic sort algorithm to identify duplicates efficiently.

```python
df_deduped = df_reset.drop_duplicates(subset="index", keep="first")

```

### Step 3: Restore the Index (Optional)

If you need the original index structure, use `set_index()` to convert the column back to the index.

```python
df_clean = df_deduped.set_index("index")

```

## Complete Code Examples

### Keep the First Occurrence (keep='first')

This example removes duplicate index rows while preserving the first occurrence of each index value.

```python
import pandas as pd

df = pd.DataFrame(
    {"A": [10, 20, 30, 40], "B": [1, 2, 3, 4]},
    index=["x", "y", "x", "z"]
)

df_clean = (
    df.reset_index()
      .drop_duplicates(subset="index", keep="first")
      .set_index("index")
)

print(df_clean)

```

**Output:**

```

       A  B
index      
x     10  1
y     20  2
z     40  4

```

### Keep the Last Occurrence (keep='last')

To retain the final row for each duplicate index, change the `keep` parameter to `'last'`.

```python
df_last = (
    df.reset_index()
      .drop_duplicates(subset="index", keep="last")
      .set_index("index")
)

print(df_last)

```

**Output:**

```

       A  B
index      
x     30  3
y     20  2
z     40  4

```

### Remove All Rows with Duplicate Indices (keep=False)

To eliminate every row that has a duplicate index, use `keep=False`.

```python
df_no_dups = (
    df.reset_index()
      .drop_duplicates(subset="index", keep=False)
      .set_index("index")
)

print(df_no_dups)

```

**Output:**

```

       A  B
index      
y     20  2
z     40  4

```

## Performance Characteristics

The `drop_duplicates` method in pandas implements an O(N log N) algorithm using a lexicographic sort under the hood. When you reset the index to a column, you leverage this highly optimized path without creating unnecessary data copies. The `reset_index` operation produces a view rather than a full copy when possible, making this workflow memory‑efficient even for large DataFrames.

## Key Source Files in pandas-dev/pandas

Understanding the implementation details helps clarify why the index is excluded by default and how to work around it.

| File | Role | Location |
|------|------|----------|
| [`pandas/core/frame.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py) | Implements `DataFrame.drop_duplicates` and explicitly excludes the index from duplicate detection | [Lines 7681‑7700](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py#L7681‑L7700) |
| [`pandas/core/indexes/base.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/indexes/base.py) | Provides `Index.drop_duplicates` for index objects, used internally when resetting the index | [Lines 2799‑2805](https://github.com/pandas-dev/pandas/blob/main/pandas/core/indexes/base.py#L2799‑L2805) |
| [`pandas/core/generic.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/generic.py) | Base class for `DataFrame` and `Series`, defines common `drop_duplicates` overloads and parameter handling | [generic.py](https://github.com/pandas-dev/pandas/blob/main/pandas/core/generic.py) |
| [`pandas/core/series.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/series.py) | Implements `Series.drop_duplicates` with behavior mirroring the DataFrame method | [series.py](https://github.com/pandas-dev/pandas/blob/main/pandas/core/series.py) |

## Summary

- **`drop_duplicates` ignores the index by design**, as implemented in [`pandas/core/frame.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py) (lines 7681‑7700), checking only column values for duplicates.
- **To remove rows with duplicate indices**, use `reset_index()` to expose the index as a column, apply `drop_duplicates(subset="index")`, and optionally restore the index with `set_index()`.
- **The `keep` parameter** controls which duplicates to retain: `first` (default), `last`, or `False` (drop all duplicates).
- **Performance is optimized** at O(N log N) via lexicographic sorting, and `reset_index` avoids data copying when possible.

## Frequently Asked Questions

### How do I remove duplicate index rows in pandas without resetting the index?

You cannot directly use `drop_duplicates` on the index without converting it to a column first, because the method explicitly ignores index values according to the implementation in [`pandas/core/frame.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py). The most efficient approach is to temporarily reset the index, deduplicate, and restore it. Alternatively, you can use boolean indexing with `df[~df.index.duplicated()]`, though this offers less control over which specific duplicate to keep compared to the `drop_duplicates` workflow.

### What is the difference between `keep='first'` and `keep='last'` when removing duplicate indices?

When you specify `keep='first'` in `drop_duplicates`, pandas retains the first occurrence of each duplicate index value in the original order and marks subsequent duplicates for removal. Conversely, `keep='last'` preserves the final occurrence of each index value and removes all earlier duplicates. If you use `keep=False`, pandas removes every row that has a duplicate index, keeping only rows with unique index values.

### Is resetting the index to remove duplicates memory efficient?

Yes, resetting the index is memory efficient because `reset_index()` does not copy the underlying data when possible; it creates a view that exposes the index as a new column. The subsequent `drop_duplicates` operation uses an O(N log N) algorithm based on lexicographic sorting rather than creating large intermediate copies. This makes the workflow suitable for large DataFrames, though you should consider chaining operations or using `inplace=True` where appropriate to control memory usage explicitly.

### Can I use `drop_duplicates` directly on a pandas Index object?

Yes, pandas Index objects have their own `drop_duplicates` method implemented in [`pandas/core/indexes/base.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/indexes/base.py) (lines 2799‑2805). However, calling `df.index.drop_duplicates()` returns a new Index object containing only unique index values, not a DataFrame with the corresponding rows removed. To get a DataFrame with duplicate index rows removed while preserving the associated data, you should use the reset‑index workflow or boolean indexing with `df.loc[df.index.drop_duplicates()]`, ensuring proper alignment to preserve the correct rows.