How to Precisely Control Aggregation Levels in Pandas Resample

Control the granularity and alignment of time-series aggregation in pandas by combining the rule frequency string with origin, offset, base, label, and closed parameters in DataFrame.resample().

The resample method in the pandas-dev/pandas repository provides powerful time-based grouping for time-series analysis. While the frequency string defines the bin width, precisely controlling the aggregation level requires understanding additional parameters that shift, align, and bound your temporal windows.

Core Architecture of the Resampler

Understanding how pandas implements resampling helps clarify where precision controls are applied.

The Resampler Class

In pandas/core/resample.py (lines 100-200), the Resampler class serves as the primary interface. When you invoke df.resample(rule), pandas instantiates this class to store the original object, the frequency string, and all resampling options. The actual aggregation occurs only when you call methods like .mean(), .sum(), or .agg().

Delegation to GroupBy Machinery

The heavy lifting is delegated to the optimized GroupBy engine. The internal method _groupby_resampler (lines 300-350 in pandas/core/resample.py) constructs a GroupBy object using time-based keys. Private methods _apply and _agg (lines 400-460) then route your aggregation calls to pandas/core/groupby/ops.py, reusing the same high-performance logic employed for ordinary categorical grouping.

Parameters for Precision Control

The rule parameter defines the bin width, but fine-grained control over where those bins start and end comes from alignment and boundary parameters.

Frequency Parsing with _get_rule

The frequency string is parsed by _get_rule in pandas/core/resample.py (lines 70-90), which generates a DateOffset object. This offset drives the mathematical calculation of bin edges, converting strings like '5T' or 'H' into precise temporal intervals.

Bin Alignment Using origin, offset, and base

To shift the entire binning grid relative to your data timestamps, use these three arguments:

  • origin: Sets an absolute reference point. Accepts 'start', 'epoch', a timestamp string, or a Timestamp object. All bins align relative to this anchor.
  • offset: Accepts a DateOffset or Timedelta (e.g., pd.Timedelta('2h')). This adds a relative shift to every bin edge after the origin is established.
  • base: Shifts the start of the first bin by an integer number of the smallest unit of the rule. For example, base=15 with rule='H' starts bins at 00:15, 01:15, etc.

Interval Boundaries with label and closed

These parameters determine which observations fall into which bin and how the result is indexed:

  • closed: Controls interval inclusivity. Use 'right' to make the right edge inclusive (upper bound included), or 'left' for the left edge. This affects which timestamps belong to adjacent bins.
  • label: Determines whether the resulting index uses the 'left' or 'right' edge of the interval as the timestamp label.

Practical Examples

The following examples demonstrate how to combine these parameters for precise temporal aggregation.

import pandas as pd
import numpy as np

# Sample time-series with 7-minute intervals

rng = pd.date_range("2023-01-01 00:00", periods=100, freq="7min")
df = pd.DataFrame({"value": np.random.randn(len(rng))}, index=rng)

# 1. Simple hourly mean (default alignment)

hourly = df.resample("H").mean()

# 2. 15-minute bins starting at 5 minutes past the hour

aligned = df.resample("15T", base=5).sum()

# 3. Daily bins anchored to 06:00 instead of midnight

daily = df.resample("D", origin="2023-01-01 06:00").sum()

# 4. 6-hour bins shifted forward by 2 hours

shifted = df.resample("6H", offset=pd.Timedelta("2h")).median()

# 5. Right-closed intervals with right-edge labeling

right_labeled = df.resample(
    "5T", label="right", closed="right"
).agg(["min", "max"])

Explanation of precision controls:

  • Example 2 uses base=5 to offset the 15-minute grid by 5 minutes, creating bins covering 00:05-00:20, 00:20-00:35, etc.
  • Example 3 sets origin to a specific timestamp, forcing daily aggregation windows to start at 06:00 rather than the default midnight.
  • Example 4 applies offset to push all 6-hour bin edges forward by 2 hours, resulting in coverage periods of 02:00-08:00, 08:00-14:00, etc.
  • Example 5 demonstrates closed='right' and label='right', ensuring that an observation exactly on a 5-minute boundary belongs to the preceding interval and carries that timestamp label.

Summary

  • The Resampler class in pandas/core/resample.py orchestrates time-series aggregation by delegating to the GroupBy engine in pandas/core/groupby/ops.py.
  • Use origin to set absolute anchor points and offset to apply relative shifts to bin edges.
  • Apply base for integer-step offsets within the frequency unit when working with specific alignment needs.
  • Control which observations are included using closed, and set the resulting index position with label.
  • These parameters combine to define any regular temporal grid, regardless of irregular raw timestamps.

Frequently Asked Questions

What is the difference between origin and offset in pandas resample?

origin establishes an absolute reference point on the timeline, such as a specific date or the string 'epoch', from which all bins are calculated. offset adds a relative timedelta shift to every bin edge after the origin is established. Use origin to anchor bins to a specific calendar time, and offset to fine-tune by hours or minutes relative to that anchor.

How does the closed parameter affect which data points are aggregated?

The closed parameter determines interval inclusivity. When set to 'right', the right edge of each time bin is inclusive, meaning an observation exactly on the boundary timestamp belongs to that bin rather than the next. When set to 'left', the left edge is inclusive. This directly controls which aggregation group boundary cases fall into.

Why does pandas resample use GroupBy operations internally?

According to the pandas source code in pandas/core/resample.py, the Resampler class calls _groupby_resampler to create a GroupBy object based on calculated time-based keys. This design reuses the highly optimized aggregation algorithms in pandas/core/groupby/ops.py and pandas/core/groupby/grouper.py, ensuring that resampling benefits from the same performance optimizations as categorical groupby operations.

How do I align resampled bins to start at a specific time of day?

Combine the origin parameter with a timestamp string containing your desired start time, or use offset with a Timedelta. For example, df.resample('D', origin='2023-01-01 06:00') aligns daily bins to 06:00 UTC, while df.resample('H', offset=pd.Timedelta('30min')) shifts hourly bins to start at 00:30, 01:30, etc.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →