Best Practices for Organizing Feature Repositories in Feast
Organizing feature repositories requires a clear separation between declarative configuration in feast_repo.yaml and imperative Python logic, modular file organization by domain, and strict naming conventions to ensure discoverability and maintainability.
A well-structured feature repository serves as the single source of truth for feature engineering workflows in Feast. By following the architectural patterns established in the Feast codebase, teams can ensure that feature definitions remain portable, testable, and secure across development and production environments.
Core Repository Structure
The foundation of organizing feature repositories starts with a standardized directory layout. Feast recommends placing all configuration and code under a root directory (commonly feature_repo/) to keep the project self-contained.
| Path / File | Purpose | Implementation Detail |
|---|---|---|
feature_repo/feast_repo.yaml |
Declarative registry of entities, feature views, and services | Enables CLI commands like feast apply to discover objects without executing Python |
feature_repo/feature_views.py |
Python definitions of FeatureViews, StreamFeatureViews, and OnDemandFeatureViews | Centralizes feature engineering logic in pandas/core/frame.py-style modular files |
feature_repo/entities.py |
Entity object definitions | Guarantees consistent join keys across all features |
feature_repo/data_sources.py |
DataSource constructors (BigQuery, Snowflake, Redshift) | Isolates environment-specific connection strings |
feature_repo/transformations/ |
Custom UDFs for on-demand transformations | Separates heavy logic from view definitions |
feature_repo/tests/ |
Unit and integration tests | Validates point-in-time correctness |
.feast/ |
Auto-generated registry snapshots and migration files | Must be version-controlled for reproducible deployments |
Reference: The official Feast structuring guide at
docs/how-to-guides/structuring-repos.mdprovides the canonical reference for this layout.
Declarative vs. Imperative Configuration
Effective organizing of feature repositories requires understanding when to use static YAML versus dynamic Python.
Declarative (feast_repo.yaml) defines simple, static objects that the Feast CLI can parse without execution:
project: driver_feature_repo
entities:
- name: driver_id
description: Primary key for driver
feature_views:
- name: driver_stats_fv
entities: [driver_id]
ttl: 86400
batch_source:
type: BigQuerySource
table_ref: myproject.dataset.driver_stats
Imperative (*.py files) handles dynamic logic, custom transformations, and complex data source construction:
# feature_views.py
from feast import FeatureView, Field, Entity, ValueType
from .data_sources import driver_stats_bq
driver_stats_fv = FeatureView(
name="driver_stats_fv",
entities=[Entity(name="driver_id", join_keys=["driver_id"])],
ttl=86400,
schema=[
Field(name="avg_fare", dtype=ValueType.FLOAT),
Field(name="trip_count", dtype=ValueType.INT64),
],
source=driver_stats_bq,
)
Place static definitions in YAML for CLI discoverability, and reserve Python for logic that requires imports, conditionals, or UDFs.
Naming Conventions
Consistent naming conventions are critical for organizing feature repositories at scale. Feast recommends the following patterns:
| Item | Pattern | Example |
|---|---|---|
| Entity | <domain>_<entity> |
driver_id |
| Feature View | <domain>_<subject>_fv |
driver_stats_fv |
| On-Demand Feature View | <domain>_<subject>_odfv |
driver_stats_odfv |
| Feature Service | <domain>_service |
driver_service |
| Data Source | <domain>_<source>_ds |
driver_events_bq_ds |
These patterns ensure that related objects sort together in file explorers and CLI outputs, making the repository self-documenting.
Modularity and Code Organization
As feature repositories grow, organizing them into sub-packages prevents monolithic files. When multiple domains exist (e.g., driver and rides), create domain-specific directories:
feature_repo/
├── driver/
│ ├── entities.py
│ ├── feature_views.py
│ └── data_sources.py
├── rides/
│ ├── entities.py
│ ├── feature_views.py
│ └── data_sources.py
└── feature_repo.py # Aggregates all domains
The top-level feature_repo.py imports and exposes all feature views to the CLI:
# feature_repo.py
from driver.feature_views import driver_stats_fv
from rides.feature_views import ride_stats_fv
__all__ = ["driver_stats_fv", "ride_stats_fv"]
Place shared utilities in a utils.py at the root, ensuring functions are pure (no side effects) so tests can import them without initializing Feast services.
Testing and Continuous Integration
Organizing feature repositories requires rigorous testing to prevent data leakage and ensure point-in-time correctness. The tests/ directory should mirror the source structure:
# tests/test_feature_views.py
import pytest
from feast import FeatureStore
from feature_views import driver_stats_fv
@pytest.fixture
def store(tmp_path):
return FeatureStore(repo_path=str(tmp_path))
def test_feature_view_schema(store):
fv = store.get_feature_view(name="driver_stats_fv")
assert fv.schema[0].name == "avg_fare"
assert fv.schema[0].dtype.name == "FLOAT"
A production-ready CI pipeline (.github/workflows/ci.yml) should:
- Lint and format code using
ruffandblack - Type check with
mypyto catch schema mismatches - Run unit tests against SQLite offline stores for speed
- Run integration tests against production offline/online stores in a staging environment
- Apply registry migrations using
feast migrateto validate schema changes
name: Feast CI
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with: { python-version: "3.10" }
- run: pip install -r requirements.txt
- run: ruff .
- run: mypy .
- run: pytest -vv
- run: feast apply
Security and Deployment
When organizing feature repositories for production, never commit credentials. Store connection strings in environment variables and reference them in data_sources.py:
# data_sources.py
import os
from feast import BigQuerySource
driver_stats_bq = BigQuerySource(
table_ref=os.getenv("DRIVER_STATS_TABLE", "dev.dataset.driver_stats"),
event_timestamp_column="event_timestamp",
)
Implement Role-Based Access Control (RBAC) as detailed in the Feast RBAC documentation. Define read-only service accounts for batch training jobs and read-write accounts for streaming ingestion pipelines. The .feast/ directory contains auto-generated registry snapshots that must be version-controlled to ensure reproducible deployments across environments.
Summary
- Maintain a clear directory structure with
feast_repo.yamlat the root, separate modules for entities, feature views, and data sources, and a dedicatedtests/directory. - Separate declarative and imperative definitions: Use YAML for static objects that the CLI can parse, and Python for dynamic logic and transformations.
- Enforce naming conventions using
<domain>_<subject>_fvpatterns to ensure discoverability and consistency across the codebase. - Modularize by domain using sub-packages when the repository grows, with a top-level aggregator file exposing all feature views to the CLI.
- Implement comprehensive CI/CD including linting, type checking, unit tests, integration tests against real stores, and automated registry migrations.
- Secure the repository by externalizing credentials to environment variables, implementing RBAC, and version-controlling the
.feast/metadata directory.
Frequently Asked Questions
How should I structure a Feast feature repository for multiple teams?
When multiple teams share a feature repository, organize feature repositories using domain-driven sub-packages. Create directories like driver/ and rides/ each containing their own entities.py and feature_views.py. Use a top-level feature_repo.py to aggregate all views, allowing the Feast CLI to discover objects while maintaining clean separation of concerns between teams.
What is the difference between feast_repo.yaml and Python feature definitions?
The feast_repo.yaml file provides declarative configuration that the Feast CLI can parse without executing Python, making it ideal for static entity and feature view definitions. Imperative Python definitions in feature_views.py are required for dynamic logic, custom transformations, UDFs, or complex data source construction that cannot be expressed in static YAML. Best practices recommend using YAML for simple objects and Python for logic requiring imports or conditionals.
How do I test feature views in a Feast repository?
Organize tests in a tests/ directory mirroring your source structure. Write unit tests that instantiate a FeatureStore in a temporary directory and verify schema definitions using store.get_feature_view(). Include integration tests that materialize features to both offline and online stores to validate point-in-time correctness and prevent data leakage. Run these tests in CI against SQLite for speed and against production store types in staging environments.
Should I commit the .feast directory to version control?
Yes, you should version-control the .feast/ directory. This directory contains auto-generated registry snapshots and migration files that Feast uses to track schema changes and ensure reproducible deployments. Committing these files allows the Feast CLI to apply incremental updates across environments and enables rollback capabilities. However, ensure that credentials and environment-specific connection strings remain externalized via environment variables rather than stored in this directory.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →