How to Precisely Control Aggregation Levels in Pandas Resample
Control the granularity and alignment of time-series aggregation in pandas by combining the rule frequency string with origin, offset, base, label, and closed parameters in DataFrame.resample().
The resample method in the pandas-dev/pandas repository provides powerful time-based grouping for time-series analysis. While the frequency string defines the bin width, precisely controlling the aggregation level requires understanding additional parameters that shift, align, and bound your temporal windows.
Core Architecture of the Resampler
Understanding how pandas implements resampling helps clarify where precision controls are applied.
The Resampler Class
In pandas/core/resample.py (lines 100-200), the Resampler class serves as the primary interface. When you invoke df.resample(rule), pandas instantiates this class to store the original object, the frequency string, and all resampling options. The actual aggregation occurs only when you call methods like .mean(), .sum(), or .agg().
Delegation to GroupBy Machinery
The heavy lifting is delegated to the optimized GroupBy engine. The internal method _groupby_resampler (lines 300-350 in pandas/core/resample.py) constructs a GroupBy object using time-based keys. Private methods _apply and _agg (lines 400-460) then route your aggregation calls to pandas/core/groupby/ops.py, reusing the same high-performance logic employed for ordinary categorical grouping.
Parameters for Precision Control
The rule parameter defines the bin width, but fine-grained control over where those bins start and end comes from alignment and boundary parameters.
Frequency Parsing with _get_rule
The frequency string is parsed by _get_rule in pandas/core/resample.py (lines 70-90), which generates a DateOffset object. This offset drives the mathematical calculation of bin edges, converting strings like '5T' or 'H' into precise temporal intervals.
Bin Alignment Using origin, offset, and base
To shift the entire binning grid relative to your data timestamps, use these three arguments:
origin: Sets an absolute reference point. Accepts'start','epoch', a timestamp string, or aTimestampobject. All bins align relative to this anchor.offset: Accepts aDateOffsetorTimedelta(e.g.,pd.Timedelta('2h')). This adds a relative shift to every bin edge after the origin is established.base: Shifts the start of the first bin by an integer number of the smallest unit of the rule. For example,base=15withrule='H'starts bins at 00:15, 01:15, etc.
Interval Boundaries with label and closed
These parameters determine which observations fall into which bin and how the result is indexed:
closed: Controls interval inclusivity. Use'right'to make the right edge inclusive (upper bound included), or'left'for the left edge. This affects which timestamps belong to adjacent bins.label: Determines whether the resulting index uses the'left'or'right'edge of the interval as the timestamp label.
Practical Examples
The following examples demonstrate how to combine these parameters for precise temporal aggregation.
import pandas as pd
import numpy as np
# Sample time-series with 7-minute intervals
rng = pd.date_range("2023-01-01 00:00", periods=100, freq="7min")
df = pd.DataFrame({"value": np.random.randn(len(rng))}, index=rng)
# 1. Simple hourly mean (default alignment)
hourly = df.resample("H").mean()
# 2. 15-minute bins starting at 5 minutes past the hour
aligned = df.resample("15T", base=5).sum()
# 3. Daily bins anchored to 06:00 instead of midnight
daily = df.resample("D", origin="2023-01-01 06:00").sum()
# 4. 6-hour bins shifted forward by 2 hours
shifted = df.resample("6H", offset=pd.Timedelta("2h")).median()
# 5. Right-closed intervals with right-edge labeling
right_labeled = df.resample(
"5T", label="right", closed="right"
).agg(["min", "max"])
Explanation of precision controls:
- Example 2 uses
base=5to offset the 15-minute grid by 5 minutes, creating bins covering 00:05-00:20, 00:20-00:35, etc. - Example 3 sets
originto a specific timestamp, forcing daily aggregation windows to start at 06:00 rather than the default midnight. - Example 4 applies
offsetto push all 6-hour bin edges forward by 2 hours, resulting in coverage periods of 02:00-08:00, 08:00-14:00, etc. - Example 5 demonstrates
closed='right'andlabel='right', ensuring that an observation exactly on a 5-minute boundary belongs to the preceding interval and carries that timestamp label.
Summary
- The
Resamplerclass inpandas/core/resample.pyorchestrates time-series aggregation by delegating to the GroupBy engine inpandas/core/groupby/ops.py. - Use
originto set absolute anchor points andoffsetto apply relative shifts to bin edges. - Apply
basefor integer-step offsets within the frequency unit when working with specific alignment needs. - Control which observations are included using
closed, and set the resulting index position withlabel. - These parameters combine to define any regular temporal grid, regardless of irregular raw timestamps.
Frequently Asked Questions
What is the difference between origin and offset in pandas resample?
origin establishes an absolute reference point on the timeline, such as a specific date or the string 'epoch', from which all bins are calculated. offset adds a relative timedelta shift to every bin edge after the origin is established. Use origin to anchor bins to a specific calendar time, and offset to fine-tune by hours or minutes relative to that anchor.
How does the closed parameter affect which data points are aggregated?
The closed parameter determines interval inclusivity. When set to 'right', the right edge of each time bin is inclusive, meaning an observation exactly on the boundary timestamp belongs to that bin rather than the next. When set to 'left', the left edge is inclusive. This directly controls which aggregation group boundary cases fall into.
Why does pandas resample use GroupBy operations internally?
According to the pandas source code in pandas/core/resample.py, the Resampler class calls _groupby_resampler to create a GroupBy object based on calculated time-based keys. This design reuses the highly optimized aggregation algorithms in pandas/core/groupby/ops.py and pandas/core/groupby/grouper.py, ensuring that resampling benefits from the same performance optimizations as categorical groupby operations.
How do I align resampled bins to start at a specific time of day?
Combine the origin parameter with a timestamp string containing your desired start time, or use offset with a Timedelta. For example, df.resample('D', origin='2023-01-01 06:00') aligns daily bins to 06:00 UTC, while df.resample('H', offset=pd.Timedelta('30min')) shifts hourly bins to start at 00:30, 01:30, etc.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →