Feast Data Sources: How to Connect BigQuery, Redshift, Snowflake, PostgreSQL, and Parquet

Feast supports five production-grade offline data sources—BigQuery, Redshift, Snowflake, PostgreSQL, and file-based Parquet/Delta—through a pluggable DataSource abstract class that standardizes validation, SQL generation, and type mapping.

Feast data sources form the backbone of the feature store's offline storage layer, enabling data scientists to ingest batch data from cloud warehouses and on-premise databases without writing custom extraction logic. Each source implements a common interface defined in sdk/python/feast/data_source.py, ensuring consistent behavior across BigQuery, Redshift, Snowflake, PostgreSQL, and file-based formats like Parquet.

Understanding Feast's Data Source Architecture

The DataSource abstract class in sdk/python/feast/data_source.py defines the contract that every offline source must fulfill. Concrete implementations handle three critical responsibilities:

  • Validation: Confirm that tables or queries exist and are accessible before feature materialization begins.
  • SQL Fragment Generation: Produce executable SQL via get_table_query_string() for embedding in point-in-time joins.
  • Type Mapping: Convert native warehouse types to Feast's internal ValueType enum through source_datatype_to_feast_value_type().

This architecture allows Feast to treat BigQuery tables, Redshift clusters, and local Parquet files interchangeably when defining feature views.

Supported Feast Data Sources

Feast ships with ready-to-use source implementations covering the major cloud data warehouses and file formats.

BigQuery

The BigQuerySource class in sdk/python/feast/infra/offline_stores/bigquery_source.py connects to Google Cloud BigQuery. It supports both table references (project:dataset.table) and arbitrary SQL queries, making it ideal for environments already running on GCP.

Redshift

Implemented in sdk/python/feast/infra/offline_stores/redshift_source.py, RedshiftSource interfaces with Amazon Redshift via the Data API. It handles schema and database resolution, allowing you to specify tables as schema.table with optional database context.

Snowflake

The SnowflakeSource class in sdk/python/feast/infra/offline_stores/snowflake_source.py manages connections to Snowflake warehouses. It supports database, schema, and warehouse overrides, and automatically handles Snowflake-specific type conversions through the source_datatype_to_feast_value_type implementation.

PostgreSQL

Available as a contrib module in sdk/python/feast/infra/offline_stores/contrib/postgres_offline_store/postgres_source.py, PostgreSQLSource provides offline store capabilities for PostgreSQL databases. This implementation requires importing from the contrib path rather than the main feast package.

File Sources (Parquet and Delta)

The FileSource class in sdk/python/feast/infra/offline_stores/file_source.py enables loading local or remote files. Currently, only Parquet and Delta formats are supported via ParquetFormat() and DeltaFormat() respectively, making this suitable for development or S3-based data lakes.

Configuring Feast Data Sources: Code Examples

Each data source follows a consistent configuration pattern while exposing warehouse-specific parameters.

BigQuery Configuration

from feast import BigQuerySource

# Table reference

bq_table_source = BigQuerySource(
    table="gcp_project:dataset.feature_table",
    timestamp_field="event_timestamp",
    created_timestamp_column="created_timestamp",
)

# SQL query

bq_query_source = BigQuerySource(
    query="""
        SELECT 
            event_timestamp,
            user_id,
            feature_1,
            feature_2
        FROM `project.dataset.table`
        WHERE event_timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
    """,
    timestamp_field="event_timestamp",
)

Redshift Configuration

from feast import RedshiftSource

redshift_source = RedshiftSource(
    table="public.driver_features",
    database="analytics_db",
    schema="public",
    timestamp_field="event_timestamp",
)

Snowflake Configuration

from feast import SnowflakeSource

snowflake_source = SnowflakeSource(
    database="FEAST_PROD",
    schema="FEATURES",
    table="user_events",
    warehouse="COMPUTE_WH",
    timestamp_field="event_timestamp",
)

PostgreSQL Configuration

from feast.infra.offline_stores.contrib.postgres_offline_store.postgres_source import PostgreSQLSource

postgres_source = PostgreSQLSource(
    table="public.transactions",
    timestamp_field="event_timestamp",
    field_mapping={"old_column_name": "new_column_name"},
)

Parquet File Configuration

from feast import FileSource
from feast.data_format import ParquetFormat

parquet_source = FileSource(
    file_format=ParquetFormat(),
    path="s3://feast-bucket/feature_data/driver_stats.parquet",
    timestamp_field="event_timestamp",
)

Attaching Sources to Feature Views

All sources integrate identically into feature definitions:

from feast import FeatureView, Field
from feast.types import Int64, Float

driver_stats_view = FeatureView(
    name="driver_stats",
    entities=["driver_id"],
    ttl=86400,
    schema=[
        Field(name="daily_trips", dtype=Int64),
        Field(name="avg_speed", dtype=Float),
    ],
    batch_source=bigquery_source,  # Replace with any source above

)

Summary

  • Feast data sources abstract connectivity to BigQuery, Redshift, Snowflake, PostgreSQL, and Parquet/Delta files through a unified DataSource interface defined in sdk/python/feast/data_source.py.
  • Each implementation handles warehouse-specific validation, SQL generation via get_table_query_string(), and type mapping via source_datatype_to_feast_value_type().
  • BigQuery, Redshift, and Snowflake are first-class citizens in the main package, while PostgreSQL resides in the contrib module at feast.infra.offline_stores.contrib.postgres_offline_store.postgres_source.
  • FileSource supports local or S3-hosted Parquet and Delta formats for development and data lake scenarios.
  • All sources share common configuration parameters (timestamp_field, field_mapping, etc.) and integrate identically with FeatureView definitions.

Frequently Asked Questions

How do I choose between BigQuery, Snowflake, and Redshift for Feast?

Choose based on your existing cloud infrastructure. If your data lives in Google Cloud, use BigQuerySource from sdk/python/feast/infra/offline_stores/bigquery_source.py. For AWS-centric architectures, RedshiftSource in redshift_source.py integrates with Redshift's Data API. For multi-cloud or existing Snowflake investments, SnowflakeSource in snowflake_source.py provides warehouse flexibility. All three implement the same DataSource interface, so migration only requires changing the source configuration, not your feature definitions.

Can I use Feast with PostgreSQL in production?

Yes, but it requires importing from the contrib module. The PostgreSQLSource class resides in sdk/python/feast/infra/offline_stores/contrib/postgres_offline_store/postgres_source.py rather than the main feast package. This placement indicates it is community-maintained compared to the first-class BigQuery, Snowflake, and Redshift implementations. For production use, ensure your PostgreSQL instance can handle the query load generated by Feast's point-in-time joins, or consider materializing features to the online store to reduce query frequency.

What file formats does Feast support for offline stores?

Feast supports Parquet and Delta formats through the FileSource class. Located in sdk/python/feast/infra/offline_stores/file_source.py, FileSource accepts either ParquetFormat() or DeltaFormat() via the file_format parameter. You can reference local filesystem paths or remote storage like S3 (e.g., path="s3://bucket/features.parquet"). Currently, CSV and JSON are not supported as offline sources; you must convert these to Parquet or Delta before ingestion.

How does Feast handle type mapping between warehouses?

Each data source implements source_datatype_to_feast_value_type() to map native warehouse types to Feast's ValueType enum. For example, in sdk/python/feast/infra/offline_stores/bigquery_source.py, the BigQuery source maps STRING to ValueType.STRING, INT64 to ValueType.INT64, etc. Similarly, the Snowflake implementation in snowflake_source.py handles Snowflake-specific types like VARCHAR and NUMBER. This abstraction ensures that feature views remain portable across different backend warehouses without requiring schema changes in your feature definitions.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →