How to Merge Pandas DataFrames Where a Value Falls Between Two Values
Use pd.IntervalIndex to construct half-open intervals from the boundary columns of the right DataFrame, map each value in the left DataFrame to its containing interval via get_indexer, and finish with a standard pd.merge on the resulting integer positions.
The pandas-dev/pandas repository provides powerful indexing structures that enable range-based joins without dedicated SQL-style BETWEEN operators. By leveraging the IntervalIndex class implemented in pandas/core/indexes/interval.py and the generic merge engine in pandas/core/reshape/merge.py, you can efficiently match rows where a scalar value falls inside an interval defined by two columns.
Why Standard Merge Falls Short
Standard equality-based merging via pd.merge requires exact key matches. When you need to join a DataFrame of events (with a single timestamp or value) to a DataFrame of sessions (defined by start and end boundaries), equality joins fail because the event value lies somewhere between the boundary columns rather than equaling them.
Implementing an Interval Join with IntervalIndex
The most robust approach uses IntervalIndex to represent the ranges and vectorized indexing to locate matches.
Step 1: Construct the IntervalIndex
First, convert the start and end columns of your right-hand DataFrame into an IntervalIndex. According to the pandas source in pandas/core/indexes/interval.py, the from_arrays constructor creates an index of half-open intervals suitable for fast containment checks.
import pandas as pd
# Right DataFrame: each row defines a range [start, end)
ranges = pd.DataFrame({
"start": [0, 10, 20, 30],
"end": [10, 20, 30, 40],
"session": ["A", "B", "C", "D"]
})
# Build IntervalIndex (closed='right' means start < x <= end)
ranges["interval"] = pd.IntervalIndex.from_arrays(
ranges["start"], ranges["end"], closed="right"
)
Step 2: Map Left Values to Interval Positions
Use the get_indexer method to find the integer position of the interval that contains each value from the left DataFrame. This method is implemented in the interval index logic and returns -1 for values outside all intervals.
# Left DataFrame: observations with a single value
obs = pd.DataFrame({
"value": [2, 15, 25, 35, 45],
"event": ["e1", "e2", "e3", "e4", "e5"]
})
# Find which interval each value belongs to
idx = ranges["interval"].get_indexer(obs["value"])
obs["interval_idx"] = idx
# Filter out values that don't fall in any interval
matched = obs[obs["interval_idx"] != -1].copy()
Step 3: Execute the Merge
Finally, perform a standard merge on the integer positions. The generic merge implementation in pandas/core/reshape/merge.py handles the join logic once the keys are aligned.
result = matched.merge(
ranges,
left_on="interval_idx",
right_index=True,
how="left",
suffixes=("", "_range")
)
print(result[["value", "event", "session"]])
Output:
value event session
0 2 e1 A
1 15 e2 B
2 25 e3 C
3 35 e4 D
Alternative: Binning with pd.cut
If you prefer a categorical approach, pd.cut bins values according to the intervals and attaches the labels automatically. This utilizes the same IntervalIndex machinery under the hood via pandas/core/arrays/interval.py.
# Create the interval index for binning
bins = pd.IntervalIndex.from_arrays(ranges["start"], ranges["end"], closed="right")
# Assign each observation to a bin (returns NaN if outside)
obs["session"] = pd.cut(obs["value"], bins=bins, labels=ranges["session"])
# Drop unmapped values and merge on the label
result = obs.dropna(subset=["session"]).merge(ranges, on="session")
This approach is concise but creates a categorical column rather than integer indices, which may be preferable for readability in downstream analysis.
Performance Considerations
The IntervalIndex.get_indexer method executes a vectorized search sorted algorithm, yielding O(n log n) complexity for the mapping step, followed by an efficient hash-based merge. For extremely large datasets (millions of intervals), ensure the right-hand DataFrame's interval index is monotonic to leverage optimized search paths inside pandas/core/indexes/interval.py.
Summary
IntervalIndexstored inpandas/core/indexes/interval.pyprovides the foundation for range-based lookups.get_indexertranslates scalar values into interval positions without explicit loops.- Standard
pd.mergeinpandas/core/reshape/merge.pycompletes the join once keys are aligned. pd.cutoffers a high-level alternative using the same interval arithmetic.
Frequently Asked Questions
How do I handle overlapping intervals in the right DataFrame?
When intervals overlap, IntervalIndex.get_indexer returns the position of the first matching interval it encounters. If you need to match all overlapping intervals rather than just the first, use pd.IntervalIndex.get_indexer_non_unique or explode the result after merging.
Can I use merge_asof instead of IntervalIndex?
pd.merge_asof matches on the nearest key rather than checking containment within a range. It works well for temporal "as-of" joins but cannot directly test whether a value falls between two arbitrary bounds. Stick with IntervalIndex for true between-style joins.
What is the difference between closed='left' and closed='right'?
The closed parameter in pd.IntervalIndex.from_arrays determines whether interval boundaries are inclusive. closed='right' includes the right bound but excludes the left (start < x <= end), while closed='left' includes the left bound but excludes the right (start <= x < end). Choose the setting that matches your business logic for boundary conditions.
Does this approach work with datetime intervals?
Yes. IntervalIndex supports any comparable dtype, including datetime64[ns]. Simply pass datetime arrays to from_arrays and ensure your left-hand values are datetime-compatible. The underlying mechanics in pandas/core/arrays/interval.py handle the comparison logic generically.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →