Feast Data Sources: How to Connect BigQuery, Redshift, Snowflake, PostgreSQL, and Parquet
Feast supports five production-grade offline data sources—BigQuery, Redshift, Snowflake, PostgreSQL, and file-based Parquet/Delta—through a pluggable DataSource abstract class that standardizes validation, SQL generation, and type mapping.
Feast data sources form the backbone of the feature store's offline storage layer, enabling data scientists to ingest batch data from cloud warehouses and on-premise databases without writing custom extraction logic. Each source implements a common interface defined in sdk/python/feast/data_source.py, ensuring consistent behavior across BigQuery, Redshift, Snowflake, PostgreSQL, and file-based formats like Parquet.
Understanding Feast's Data Source Architecture
The DataSource abstract class in sdk/python/feast/data_source.py defines the contract that every offline source must fulfill. Concrete implementations handle three critical responsibilities:
- Validation: Confirm that tables or queries exist and are accessible before feature materialization begins.
- SQL Fragment Generation: Produce executable SQL via
get_table_query_string()for embedding in point-in-time joins. - Type Mapping: Convert native warehouse types to Feast's internal
ValueTypeenum throughsource_datatype_to_feast_value_type().
This architecture allows Feast to treat BigQuery tables, Redshift clusters, and local Parquet files interchangeably when defining feature views.
Supported Feast Data Sources
Feast ships with ready-to-use source implementations covering the major cloud data warehouses and file formats.
BigQuery
The BigQuerySource class in sdk/python/feast/infra/offline_stores/bigquery_source.py connects to Google Cloud BigQuery. It supports both table references (project:dataset.table) and arbitrary SQL queries, making it ideal for environments already running on GCP.
Redshift
Implemented in sdk/python/feast/infra/offline_stores/redshift_source.py, RedshiftSource interfaces with Amazon Redshift via the Data API. It handles schema and database resolution, allowing you to specify tables as schema.table with optional database context.
Snowflake
The SnowflakeSource class in sdk/python/feast/infra/offline_stores/snowflake_source.py manages connections to Snowflake warehouses. It supports database, schema, and warehouse overrides, and automatically handles Snowflake-specific type conversions through the source_datatype_to_feast_value_type implementation.
PostgreSQL
Available as a contrib module in sdk/python/feast/infra/offline_stores/contrib/postgres_offline_store/postgres_source.py, PostgreSQLSource provides offline store capabilities for PostgreSQL databases. This implementation requires importing from the contrib path rather than the main feast package.
File Sources (Parquet and Delta)
The FileSource class in sdk/python/feast/infra/offline_stores/file_source.py enables loading local or remote files. Currently, only Parquet and Delta formats are supported via ParquetFormat() and DeltaFormat() respectively, making this suitable for development or S3-based data lakes.
Configuring Feast Data Sources: Code Examples
Each data source follows a consistent configuration pattern while exposing warehouse-specific parameters.
BigQuery Configuration
from feast import BigQuerySource
# Table reference
bq_table_source = BigQuerySource(
table="gcp_project:dataset.feature_table",
timestamp_field="event_timestamp",
created_timestamp_column="created_timestamp",
)
# SQL query
bq_query_source = BigQuerySource(
query="""
SELECT
event_timestamp,
user_id,
feature_1,
feature_2
FROM `project.dataset.table`
WHERE event_timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
""",
timestamp_field="event_timestamp",
)
Redshift Configuration
from feast import RedshiftSource
redshift_source = RedshiftSource(
table="public.driver_features",
database="analytics_db",
schema="public",
timestamp_field="event_timestamp",
)
Snowflake Configuration
from feast import SnowflakeSource
snowflake_source = SnowflakeSource(
database="FEAST_PROD",
schema="FEATURES",
table="user_events",
warehouse="COMPUTE_WH",
timestamp_field="event_timestamp",
)
PostgreSQL Configuration
from feast.infra.offline_stores.contrib.postgres_offline_store.postgres_source import PostgreSQLSource
postgres_source = PostgreSQLSource(
table="public.transactions",
timestamp_field="event_timestamp",
field_mapping={"old_column_name": "new_column_name"},
)
Parquet File Configuration
from feast import FileSource
from feast.data_format import ParquetFormat
parquet_source = FileSource(
file_format=ParquetFormat(),
path="s3://feast-bucket/feature_data/driver_stats.parquet",
timestamp_field="event_timestamp",
)
Attaching Sources to Feature Views
All sources integrate identically into feature definitions:
from feast import FeatureView, Field
from feast.types import Int64, Float
driver_stats_view = FeatureView(
name="driver_stats",
entities=["driver_id"],
ttl=86400,
schema=[
Field(name="daily_trips", dtype=Int64),
Field(name="avg_speed", dtype=Float),
],
batch_source=bigquery_source, # Replace with any source above
)
Summary
- Feast data sources abstract connectivity to BigQuery, Redshift, Snowflake, PostgreSQL, and Parquet/Delta files through a unified
DataSourceinterface defined insdk/python/feast/data_source.py. - Each implementation handles warehouse-specific validation, SQL generation via
get_table_query_string(), and type mapping viasource_datatype_to_feast_value_type(). - BigQuery, Redshift, and Snowflake are first-class citizens in the main package, while PostgreSQL resides in the contrib module at
feast.infra.offline_stores.contrib.postgres_offline_store.postgres_source. - FileSource supports local or S3-hosted Parquet and Delta formats for development and data lake scenarios.
- All sources share common configuration parameters (
timestamp_field,field_mapping, etc.) and integrate identically withFeatureViewdefinitions.
Frequently Asked Questions
How do I choose between BigQuery, Snowflake, and Redshift for Feast?
Choose based on your existing cloud infrastructure. If your data lives in Google Cloud, use BigQuerySource from sdk/python/feast/infra/offline_stores/bigquery_source.py. For AWS-centric architectures, RedshiftSource in redshift_source.py integrates with Redshift's Data API. For multi-cloud or existing Snowflake investments, SnowflakeSource in snowflake_source.py provides warehouse flexibility. All three implement the same DataSource interface, so migration only requires changing the source configuration, not your feature definitions.
Can I use Feast with PostgreSQL in production?
Yes, but it requires importing from the contrib module. The PostgreSQLSource class resides in sdk/python/feast/infra/offline_stores/contrib/postgres_offline_store/postgres_source.py rather than the main feast package. This placement indicates it is community-maintained compared to the first-class BigQuery, Snowflake, and Redshift implementations. For production use, ensure your PostgreSQL instance can handle the query load generated by Feast's point-in-time joins, or consider materializing features to the online store to reduce query frequency.
What file formats does Feast support for offline stores?
Feast supports Parquet and Delta formats through the FileSource class. Located in sdk/python/feast/infra/offline_stores/file_source.py, FileSource accepts either ParquetFormat() or DeltaFormat() via the file_format parameter. You can reference local filesystem paths or remote storage like S3 (e.g., path="s3://bucket/features.parquet"). Currently, CSV and JSON are not supported as offline sources; you must convert these to Parquet or Delta before ingestion.
How does Feast handle type mapping between warehouses?
Each data source implements source_datatype_to_feast_value_type() to map native warehouse types to Feast's ValueType enum. For example, in sdk/python/feast/infra/offline_stores/bigquery_source.py, the BigQuery source maps STRING to ValueType.STRING, INT64 to ValueType.INT64, etc. Similarly, the Snowflake implementation in snowflake_source.py handles Snowflake-specific types like VARCHAR and NUMBER. This abstraction ensures that feature views remain portable across different backend warehouses without requiring schema changes in your feature definitions.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →