# How Feast Handles Point-in-Time Joins to Prevent Data Leakage During Training

> Feast prevents data leakage with point-in-time joins using TTL-aware SQL or Pandas. Learn how Feast guarantees temporal correctness for accurate model training.

- Repository: [Feast/feast](https://github.com/feast-dev/feast)
- Tags: deep-dive
- Published: 2026-03-01

---

**Feast prevents data leakage by executing point-in-time joins that attach only feature values known at or before each training row's event timestamp, leveraging TTL-aware SQL templates or Pandas asof-merges to guarantee temporal correctness.**

Feast (feast-dev/feast) eliminates data leakage in machine learning training pipelines through sophisticated point-in-time joins. When building training datasets, the framework ensures that feature values joined to each entity row reflect only data that was available at that specific moment in history. This temporal correctness is enforced across multiple architectural layers, from timestamp inference to provider-specific query execution.

## Understanding Point-in-Time Joins in Feast

**Point-in-time (PIT) joins** are the mechanism by which Feast attaches historical feature values to entity rows without using future information. For each row in the training dataset, Feast retrieves feature values where the feature timestamp is less than or equal to the entity's event timestamp, while respecting the Time-To-Live (TTL) window defined on the FeatureView.

## The Four-Layer Architecture of PIT Joins

The implementation spans four critical layers in the offline store infrastructure:

### 1. Entity Timestamp Inference

The `infer_event_timestamp_from_entity_df` function in [`sdk/python/feast/infra/offline_stores/offline_utils.py`](https://github.com/feast-dev/feast/blob/main/sdk/python/feast/infra/offline_stores/offline_utils.py) (lines 28-44) automatically detects which column contains the event time. If the default `event_timestamp` column is missing, Feast infers the datetime column or raises a `FeastEntityDFMissingColumnsError`.

### 2. Query Context Construction

For each FeatureView, `get_feature_view_query_context` in [`offline_utils.py`](https://github.com/feast-dev/feast/blob/main/offline_utils.py) (lines 101-177) builds a `FeatureViewQueryContext` containing:

- Join keys mapped from entity columns
- TTL converted to seconds
- The source's `timestamp_field`
- Time window boundaries: maximum entity timestamp and minimum timestamp (entity timestamp minus TTL)

### 3. SQL and Pandas Rendering

The `build_point_in_time_query` function (lines 84-124 in [`offline_utils.py`](https://github.com/feast-dev/feast/blob/main/offline_utils.py)) renders the actual join logic. For BigQuery, Redshift, Snowflake, and other SQL-based stores, it generates templated SQL using LATERAL joins or MAX-OVER windows. For local execution, it falls back to Pandas asof-merges. This is utilized within each store's `get_historical_features` implementation, such as in [`bigquery.py`](https://github.com/feast-dev/feast/blob/main/bigquery.py) (lines 235-272).

### 4. Provider Facade

The public API entry point `FeatureStore.get_historical_features` in [`feature_store.py`](https://github.com/feast-dev/feast/blob/main/feature_store.py) (lines 1242-1286) delegates to `Provider.get_historical_features` in [`provider.py`](https://github.com/feast-dev/feast/blob/main/provider.py) (lines 48-60), which orchestrates the offline store execution and returns a `RetrievalJob`.

## How Feast Executes Point-in-Time Joins

The complete workflow follows these steps:

1. **Validation**: Feast validates the entity dataframe using `assert_expected_columns_in_entity_df`, ensuring required columns exist.

2. **Timestamp Detection**: If not explicitly named `event_timestamp`, the system infers the timestamp column via `infer_event_timestamp_from_entity_df`.

3. **Context Building**: For every referenced FeatureView, `get_feature_view_query_context` calculates the valid time window using the entity timestamp range and TTL.

4. **Query Generation**: `build_point_in_time_query` creates store-specific SQL or Pandas logic that filters source rows where `feature_timestamp <= entity_timestamp` and `feature_timestamp >= (entity_timestamp - TTL)`.

5. **Execution**: The offline store executes the generated query, returning a dataset where each row contains entity columns joined only with historically valid features.

This process is documented in detail in [`docs/getting-started/concepts/point-in-time-joins.md`](https://github.com/feast-dev/feast/blob/main/docs/getting-started/concepts/point-in-time-joins.md).

## Practical Implementation Examples

### Basic Historical Feature Retrieval

```python
from feast import FeatureStore
import pandas as pd

# Entity dataframe must contain entity keys and a timestamp column

entity_df = pd.read_csv("entity_df.csv")  # Columns: driver_id, event_timestamp, label

store = FeatureStore(repo_path=".")

# Feast automatically performs point-in-time joins

training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "driver_hourly_stats:trips_today",
        "driver_hourly_stats:earnings_today",
    ],
).to_df()

```

### Handling Custom Timestamp Columns

```python

# If your timestamp column is named differently, rename it before passing to Feast

entity_df = pd.read_csv("entity_df.csv")  # Contains 'event_time' instead of 'event_timestamp'

training_df = store.get_historical_features(
    entity_df=entity_df.rename(columns={"event_time": "event_timestamp"}),
    features=["driver_hourly_stats:trips_today"],
).to_df()

```

### SQL-Based Entity DataFrames

```python
sql = """
SELECT driver_id, order_timestamp AS event_timestamp, label
FROM my_warehouse.orders
WHERE order_timestamp BETWEEN '2023-01-01' AND '2023-02-01'
"""

training_df = store.get_historical_features(
    entity_df=sql,
    features=["driver_hourly_stats:trips_today"],
).to_df()

```

## Summary

- Feast prevents data leakage through **point-in-time joins** that restrict feature values to those available at each entity's event timestamp.
- The implementation relies on `infer_event_timestamp_from_entity_df` in [`offline_utils.py`](https://github.com/feast-dev/feast/blob/main/offline_utils.py) for timestamp detection and `get_feature_view_query_context` for window calculation.
- SQL generation via `build_point_in_time_query` creates store-specific queries using LATERAL joins or window functions.
- The `FeatureStore.get_historical_features` method in [`feature_store.py`](https://github.com/feast-dev/feast/blob/main/feature_store.py) provides the public interface, delegating to provider-specific offline stores.

## Frequently Asked Questions

### What is data leakage in feature engineering?

Data leakage occurs when information from outside the training dataset is used to create the model, resulting in overly optimistic performance metrics. In temporal data, this happens when future feature values are joined to past entity rows. Feast's point-in-time joins prevent this by strictly enforcing that only data available at or before the event timestamp is retrieved.

### How does Feast determine which timestamp column to use?

By default, Feast expects an `event_timestamp` column in the entity dataframe. If absent, the `infer_event_timestamp_from_entity_df` function in [`sdk/python/feast/infra/offline_stores/offline_utils.py`](https://github.com/feast-dev/feast/blob/main/sdk/python/feast/infra/offline_stores/offline_utils.py) (lines 28-44) attempts to infer the datetime column automatically. If inference fails, Feast raises a `FeastEntityDFMissingColumnsError`.

### What happens if no TTL is set on a FeatureView?

If no Time-To-Live (TTL) is specified, Feast does not apply a lower bound to the time window when scanning for features. This means the join will consider all historical feature values up to the entity timestamp, potentially scanning larger datasets but ensuring no future leakage occurs.

### Does Feast support point-in-time joins for real-time predictions?

Point-in-time joins are primarily used for batch historical retrieval during training. For real-time predictions, Feast uses the online store to fetch the latest feature values via `get_online_features`, which does not perform point-in-time joins but rather retrieves the current state of features.