# How to Precisely Control Aggregation Levels in Pandas Resample

> Precisely control aggregation levels in pandas resample for time series data. Master rule frequency offset origin base label and closed parameters for accurate granularity and alignment.

- Repository: [pandas/pandas](https://github.com/pandas-dev/pandas)
- Tags: tutorial
- Published: 2026-02-16

---

**Control the granularity and alignment of time-series aggregation in pandas by combining the `rule` frequency string with `origin`, `offset`, `base`, `label`, and `closed` parameters in `DataFrame.resample()`.**

The `resample` method in the pandas-dev/pandas repository provides powerful time-based grouping for time-series analysis. While the frequency string defines the bin width, precisely controlling the aggregation level requires understanding additional parameters that shift, align, and bound your temporal windows.

## Core Architecture of the Resampler

Understanding how pandas implements resampling helps clarify where precision controls are applied.

### The Resampler Class

In [`pandas/core/resample.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/resample.py) (lines 100-200), the **`Resampler`** class serves as the primary interface. When you invoke `df.resample(rule)`, pandas instantiates this class to store the original object, the frequency string, and all resampling options. The actual aggregation occurs only when you call methods like `.mean()`, `.sum()`, or `.agg()`.

### Delegation to GroupBy Machinery

The heavy lifting is delegated to the optimized **GroupBy** engine. The internal method `_groupby_resampler` (lines 300-350 in [`pandas/core/resample.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/resample.py)) constructs a `GroupBy` object using time-based keys. Private methods `_apply` and `_agg` (lines 400-460) then route your aggregation calls to [`pandas/core/groupby/ops.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/groupby/ops.py), reusing the same high-performance logic employed for ordinary categorical grouping.

## Parameters for Precision Control

The **`rule`** parameter defines the bin width, but fine-grained control over where those bins start and end comes from alignment and boundary parameters.

### Frequency Parsing with `_get_rule`

The frequency string is parsed by `_get_rule` in [`pandas/core/resample.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/resample.py) (lines 70-90), which generates a `DateOffset` object. This offset drives the mathematical calculation of bin edges, converting strings like `'5T'` or `'H'` into precise temporal intervals.

### Bin Alignment Using `origin`, `offset`, and `base`

To shift the entire binning grid relative to your data timestamps, use these three arguments:

- **`origin`**: Sets an absolute reference point. Accepts `'start'`, `'epoch'`, a timestamp string, or a `Timestamp` object. All bins align relative to this anchor.
- **`offset`**: Accepts a `DateOffset` or `Timedelta` (e.g., `pd.Timedelta('2h')`). This adds a relative shift to every bin edge after the origin is established.
- **`base`**: Shifts the start of the first bin by an integer number of the smallest unit of the rule. For example, `base=15` with `rule='H'` starts bins at 00:15, 01:15, etc.

### Interval Boundaries with `label` and `closed`

These parameters determine which observations fall into which bin and how the result is indexed:

- **`closed`**: Controls interval inclusivity. Use `'right'` to make the right edge inclusive (upper bound included), or `'left'` for the left edge. This affects which timestamps belong to adjacent bins.
- **`label`**: Determines whether the resulting index uses the `'left'` or `'right'` edge of the interval as the timestamp label.

## Practical Examples

The following examples demonstrate how to combine these parameters for precise temporal aggregation.

```python
import pandas as pd
import numpy as np

# Sample time-series with 7-minute intervals

rng = pd.date_range("2023-01-01 00:00", periods=100, freq="7min")
df = pd.DataFrame({"value": np.random.randn(len(rng))}, index=rng)

# 1. Simple hourly mean (default alignment)

hourly = df.resample("H").mean()

# 2. 15-minute bins starting at 5 minutes past the hour

aligned = df.resample("15T", base=5).sum()

# 3. Daily bins anchored to 06:00 instead of midnight

daily = df.resample("D", origin="2023-01-01 06:00").sum()

# 4. 6-hour bins shifted forward by 2 hours

shifted = df.resample("6H", offset=pd.Timedelta("2h")).median()

# 5. Right-closed intervals with right-edge labeling

right_labeled = df.resample(
    "5T", label="right", closed="right"
).agg(["min", "max"])

```

**Explanation of precision controls:**

- Example 2 uses **`base=5`** to offset the 15-minute grid by 5 minutes, creating bins covering 00:05-00:20, 00:20-00:35, etc.
- Example 3 sets **`origin`** to a specific timestamp, forcing daily aggregation windows to start at 06:00 rather than the default midnight.
- Example 4 applies **`offset`** to push all 6-hour bin edges forward by 2 hours, resulting in coverage periods of 02:00-08:00, 08:00-14:00, etc.
- Example 5 demonstrates **`closed='right'`** and **`label='right'`**, ensuring that an observation exactly on a 5-minute boundary belongs to the preceding interval and carries that timestamp label.

## Summary

- The **`Resampler`** class in [`pandas/core/resample.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/resample.py) orchestrates time-series aggregation by delegating to the GroupBy engine in [`pandas/core/groupby/ops.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/groupby/ops.py).
- Use **`origin`** to set absolute anchor points and **`offset`** to apply relative shifts to bin edges.
- Apply **`base`** for integer-step offsets within the frequency unit when working with specific alignment needs.
- Control which observations are included using **`closed`**, and set the resulting index position with **`label`**.
- These parameters combine to define any regular temporal grid, regardless of irregular raw timestamps.

## Frequently Asked Questions

### What is the difference between `origin` and `offset` in pandas resample?

**`origin`** establishes an absolute reference point on the timeline, such as a specific date or the string `'epoch'`, from which all bins are calculated. **`offset`** adds a relative timedelta shift to every bin edge after the origin is established. Use `origin` to anchor bins to a specific calendar time, and `offset` to fine-tune by hours or minutes relative to that anchor.

### How does the `closed` parameter affect which data points are aggregated?

The **`closed`** parameter determines interval inclusivity. When set to `'right'`, the right edge of each time bin is inclusive, meaning an observation exactly on the boundary timestamp belongs to that bin rather than the next. When set to `'left'`, the left edge is inclusive. This directly controls which aggregation group boundary cases fall into.

### Why does pandas resample use GroupBy operations internally?

According to the pandas source code in [`pandas/core/resample.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/resample.py), the `Resampler` class calls `_groupby_resampler` to create a `GroupBy` object based on calculated time-based keys. This design reuses the highly optimized aggregation algorithms in [`pandas/core/groupby/ops.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/groupby/ops.py) and [`pandas/core/groupby/grouper.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/groupby/grouper.py), ensuring that resampling benefits from the same performance optimizations as categorical groupby operations.

### How do I align resampled bins to start at a specific time of day?

Combine the **`origin`** parameter with a timestamp string containing your desired start time, or use **`offset`** with a `Timedelta`. For example, `df.resample('D', origin='2023-01-01 06:00')` aligns daily bins to 06:00 UTC, while `df.resample('H', offset=pd.Timedelta('30min'))` shifts hourly bins to start at 00:30, 01:30, etc.