# How to Create a New Column in a Pandas DataFrame Using value_counts

> Learn how to create a new column in a Pandas DataFrame with value_counts. Discover efficient methods like map and groupby transform for adding counts to your data.

- Repository: [pandas/pandas](https://github.com/pandas-dev/pandas)
- Tags: how-to-guide
- Published: 2026-02-19

---

**To create a new column containing value counts in pandas, map the result of `Series.value_counts()` back to the original column using `map()`, or use `groupby().transform('size')` for an index-preserving alternative.**

Creating a frequency count column is a common data preprocessing task when analyzing distributions within your dataset. The pandas library provides vectorized operations that eliminate the need for explicit Python loops, leveraging optimized Cython routines under the hood. This guide demonstrates how to attach value count results to a DataFrame using methods implemented in the `pandas-dev/pandas` repository.

## Understanding value_counts in pandas

The `value_counts` functionality exists at two levels in the pandas API. `Series.value_counts` (implemented in [`pandas/core/series.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/series.py) around line 2300) counts unique values within a single column, while `DataFrame.value_counts` (located in [`pandas/core/frame.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py) at lines 8383–8450) counts unique combinations of rows across multiple columns.

For the task of counting values in a single column and attaching those counts back to the original DataFrame, you need the **Series implementation**. This method returns a Series where the index contains unique values and the corresponding data contains their frequencies.

Under the hood, `Series.value_counts` delegates to `value_counts_internal` in [`pandas/core/algorithms.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/algorithms.py) (lines 839–934), which ultimately calls the low-level Cython routine in `pandas/_libs/algos.pyx` for performance-critical counting operations.

## Method 1: Using map() with Series.value_counts()

The most direct approach involves computing the counts and then mapping them back to the original column. This works because the Series returned by `value_counts` uses the unique values as its index, making it perfectly suited for the `map()` operation.

```python
import pandas as pd

df = pd.DataFrame({
    "fruit": ["apple", "banana", "apple", "orange", "banana", "banana"],
    "price": [1.2, 0.8, 1.3, 0.9, 0.85, 0.8],
})

# Compute value counts and map back to create new column

counts = df["fruit"].value_counts()
df["fruit_count"] = df["fruit"].map(counts)

print(df)

```

**Output:**

```

    fruit  price  fruit_count
0   apple   1.20            2
1  banana   0.80            3
2   apple   1.30            2
3  orange   0.90            1
4  banana   0.85            3
5  banana   0.80            3

```

This method is efficient because `map()` performs a hash-based lookup using the index of the counts Series. According to the pandas source code, the resulting Series from `value_counts` is already hash-indexed, enabling fast O(1) lookups for each row during the mapping operation.

## Method 2: Using groupby().transform()

An alternative that stays entirely within the DataFrame API uses `groupby()` combined with `transform()`. This approach preserves the original DataFrame's index automatically without requiring an explicit mapping step.

```python

# Create new column using groupby transform

df["fruit_count_alt"] = df.groupby("fruit")["fruit"].transform("size")

```

Both methods produce identical results, but the `groupby` approach internally calls the same counting routine while handling the alignment mechanics internally. This can be advantageous when working with complex indices or when you need to perform additional aggregations within the same groupby operation.

## Handling Missing Values and Categorical Data

### Missing Values

By default, `value_counts` excludes `NaN` values from the count (using `dropna=True`). To include missing values in your frequency counts, explicitly set `dropna=False`:

```python
counts = df["fruit"].value_counts(dropna=False)
df["fruit_with_nan"] = df["fruit"].map(counts)

```

### Categorical Data

The mapping pattern works seamlessly with categorical columns without requiring conversion to object dtype. The `value_counts` method handles categorical data efficiently by utilizing the underlying category codes:

```python
df["category"] = pd.Categorical(["A", "B", "A", "C", "B", "B"])
df["cat_counts"] = df["category"].map(df["category"].value_counts())

```

## Performance and Implementation Details

The efficiency of these operations stems from pandas' layered architecture. When you call `df["column"].value_counts()`, the execution flows through these key files:

- **[`pandas/core/series.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/series.py)**: Implements the high-level `value_counts` method for Series objects
- **[`pandas/core/algorithms.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/algorithms.py)**: Contains `value_counts_internal` (lines 839–934), which handles the algorithm selection and preprocessing
- **`pandas/_libs/algos.pyx`**: Provides the Cython-optimized counting implementation that processes the actual data

The `map()` method leverages the fact that the counts Series is already indexed by the unique values, eliminating the need for expensive join operations. For extremely large datasets, this hash-map approach significantly outperforms iterative solutions or merge-based alternatives.

## Summary

- **Use `map()` with `value_counts()`** to create a frequency column by mapping the counts Series back to your original DataFrame column.
- **Use `groupby().transform('size')`** as a concise alternative that preserves index alignment automatically.
- **Set `dropna=False`** in `value_counts()` to include null values in your frequency calculations.
- **Reference the source implementation** in [`pandas/core/series.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/series.py) and [`pandas/core/algorithms.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/algorithms.py) to understand the underlying optimization via Cython routines in `pandas/_libs/algos.pyx`.

## Frequently Asked Questions

### Why does my new column contain NaN values after using map() with value_counts?

This occurs when your original column contains values that were excluded from the `value_counts` result, typically `NaN` values (since `dropna=True` by default) or values filtered by the `subset` parameter. To include missing values in your counts, compute the frequencies using `df["col"].value_counts(dropna=False)` before mapping.

### Is groupby().transform() faster than using map() with value_counts()?

Both methods utilize optimized pandas internals, but performance characteristics vary by dataset shape. The `map()` approach creates an intermediate Series and performs hash lookups, while `groupby().transform()` uses the grouping machinery. For most use cases, performance differences are negligible; choose based on code readability and whether you need additional groupby operations.

### Can I use value_counts on multiple columns to create a combined frequency column?

For counting combinations of multiple columns, use `DataFrame.value_counts()` (implemented in [`pandas/core/frame.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py)), but note that this returns a Series with a MultiIndex of combinations rather than a mappable result. To attach combined counts back to the original DataFrame, use `groupby()` on multiple columns with `transform('size')` instead.

### Does value_counts work with all data types in pandas?

Yes, `value_counts` supports any dtype that pandas can hash, including categorical, nullable integer (Int64), string, datetime, and object types. The underlying Cython implementation in `pandas/_libs/algos.pyx` handles type-specific optimizations automatically.