Best Practices for Organizing Feature Repositories in Feast

Organizing feature repositories requires a clear separation between declarative configuration in feast_repo.yaml and imperative Python logic, modular file organization by domain, and strict naming conventions to ensure discoverability and maintainability.

A well-structured feature repository serves as the single source of truth for feature engineering workflows in Feast. By following the architectural patterns established in the Feast codebase, teams can ensure that feature definitions remain portable, testable, and secure across development and production environments.

Core Repository Structure

The foundation of organizing feature repositories starts with a standardized directory layout. Feast recommends placing all configuration and code under a root directory (commonly feature_repo/) to keep the project self-contained.

Path / File Purpose Implementation Detail
feature_repo/feast_repo.yaml Declarative registry of entities, feature views, and services Enables CLI commands like feast apply to discover objects without executing Python
feature_repo/feature_views.py Python definitions of FeatureViews, StreamFeatureViews, and OnDemandFeatureViews Centralizes feature engineering logic in pandas/core/frame.py-style modular files
feature_repo/entities.py Entity object definitions Guarantees consistent join keys across all features
feature_repo/data_sources.py DataSource constructors (BigQuery, Snowflake, Redshift) Isolates environment-specific connection strings
feature_repo/transformations/ Custom UDFs for on-demand transformations Separates heavy logic from view definitions
feature_repo/tests/ Unit and integration tests Validates point-in-time correctness
.feast/ Auto-generated registry snapshots and migration files Must be version-controlled for reproducible deployments

Reference: The official Feast structuring guide at docs/how-to-guides/structuring-repos.md provides the canonical reference for this layout.

Declarative vs. Imperative Configuration

Effective organizing of feature repositories requires understanding when to use static YAML versus dynamic Python.

Declarative (feast_repo.yaml) defines simple, static objects that the Feast CLI can parse without execution:

project: driver_feature_repo

entities:
  - name: driver_id
    description: Primary key for driver

feature_views:
  - name: driver_stats_fv
    entities: [driver_id]
    ttl: 86400
    batch_source:
      type: BigQuerySource
      table_ref: myproject.dataset.driver_stats

Imperative (*.py files) handles dynamic logic, custom transformations, and complex data source construction:


# feature_views.py

from feast import FeatureView, Field, Entity, ValueType
from .data_sources import driver_stats_bq

driver_stats_fv = FeatureView(
    name="driver_stats_fv",
    entities=[Entity(name="driver_id", join_keys=["driver_id"])],
    ttl=86400,
    schema=[
        Field(name="avg_fare", dtype=ValueType.FLOAT),
        Field(name="trip_count", dtype=ValueType.INT64),
    ],
    source=driver_stats_bq,
)

Place static definitions in YAML for CLI discoverability, and reserve Python for logic that requires imports, conditionals, or UDFs.

Naming Conventions

Consistent naming conventions are critical for organizing feature repositories at scale. Feast recommends the following patterns:

Item Pattern Example
Entity <domain>_<entity> driver_id
Feature View <domain>_<subject>_fv driver_stats_fv
On-Demand Feature View <domain>_<subject>_odfv driver_stats_odfv
Feature Service <domain>_service driver_service
Data Source <domain>_<source>_ds driver_events_bq_ds

These patterns ensure that related objects sort together in file explorers and CLI outputs, making the repository self-documenting.

Modularity and Code Organization

As feature repositories grow, organizing them into sub-packages prevents monolithic files. When multiple domains exist (e.g., driver and rides), create domain-specific directories:


feature_repo/
├── driver/
│   ├── entities.py
│   ├── feature_views.py
│   └── data_sources.py
├── rides/
│   ├── entities.py
│   ├── feature_views.py
│   └── data_sources.py
└── feature_repo.py  # Aggregates all domains

The top-level feature_repo.py imports and exposes all feature views to the CLI:


# feature_repo.py

from driver.feature_views import driver_stats_fv
from rides.feature_views import ride_stats_fv

__all__ = ["driver_stats_fv", "ride_stats_fv"]

Place shared utilities in a utils.py at the root, ensuring functions are pure (no side effects) so tests can import them without initializing Feast services.

Testing and Continuous Integration

Organizing feature repositories requires rigorous testing to prevent data leakage and ensure point-in-time correctness. The tests/ directory should mirror the source structure:


# tests/test_feature_views.py

import pytest
from feast import FeatureStore
from feature_views import driver_stats_fv

@pytest.fixture
def store(tmp_path):
    return FeatureStore(repo_path=str(tmp_path))

def test_feature_view_schema(store):
    fv = store.get_feature_view(name="driver_stats_fv")
    assert fv.schema[0].name == "avg_fare"
    assert fv.schema[0].dtype.name == "FLOAT"

A production-ready CI pipeline (.github/workflows/ci.yml) should:

  1. Lint and format code using ruff and black
  2. Type check with mypy to catch schema mismatches
  3. Run unit tests against SQLite offline stores for speed
  4. Run integration tests against production offline/online stores in a staging environment
  5. Apply registry migrations using feast migrate to validate schema changes
name: Feast CI
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
        with: { python-version: "3.10" }
      - run: pip install -r requirements.txt
      - run: ruff .
      - run: mypy .
      - run: pytest -vv
      - run: feast apply

Security and Deployment

When organizing feature repositories for production, never commit credentials. Store connection strings in environment variables and reference them in data_sources.py:


# data_sources.py

import os
from feast import BigQuerySource

driver_stats_bq = BigQuerySource(
    table_ref=os.getenv("DRIVER_STATS_TABLE", "dev.dataset.driver_stats"),
    event_timestamp_column="event_timestamp",
)

Implement Role-Based Access Control (RBAC) as detailed in the Feast RBAC documentation. Define read-only service accounts for batch training jobs and read-write accounts for streaming ingestion pipelines. The .feast/ directory contains auto-generated registry snapshots that must be version-controlled to ensure reproducible deployments across environments.

Summary

  • Maintain a clear directory structure with feast_repo.yaml at the root, separate modules for entities, feature views, and data sources, and a dedicated tests/ directory.
  • Separate declarative and imperative definitions: Use YAML for static objects that the CLI can parse, and Python for dynamic logic and transformations.
  • Enforce naming conventions using <domain>_<subject>_fv patterns to ensure discoverability and consistency across the codebase.
  • Modularize by domain using sub-packages when the repository grows, with a top-level aggregator file exposing all feature views to the CLI.
  • Implement comprehensive CI/CD including linting, type checking, unit tests, integration tests against real stores, and automated registry migrations.
  • Secure the repository by externalizing credentials to environment variables, implementing RBAC, and version-controlling the .feast/ metadata directory.

Frequently Asked Questions

How should I structure a Feast feature repository for multiple teams?

When multiple teams share a feature repository, organize feature repositories using domain-driven sub-packages. Create directories like driver/ and rides/ each containing their own entities.py and feature_views.py. Use a top-level feature_repo.py to aggregate all views, allowing the Feast CLI to discover objects while maintaining clean separation of concerns between teams.

What is the difference between feast_repo.yaml and Python feature definitions?

The feast_repo.yaml file provides declarative configuration that the Feast CLI can parse without executing Python, making it ideal for static entity and feature view definitions. Imperative Python definitions in feature_views.py are required for dynamic logic, custom transformations, UDFs, or complex data source construction that cannot be expressed in static YAML. Best practices recommend using YAML for simple objects and Python for logic requiring imports or conditionals.

How do I test feature views in a Feast repository?

Organize tests in a tests/ directory mirroring your source structure. Write unit tests that instantiate a FeatureStore in a temporary directory and verify schema definitions using store.get_feature_view(). Include integration tests that materialize features to both offline and online stores to validate point-in-time correctness and prevent data leakage. Run these tests in CI against SQLite for speed and against production store types in staging environments.

Should I commit the .feast directory to version control?

Yes, you should version-control the .feast/ directory. This directory contains auto-generated registry snapshots and migration files that Feast uses to track schema changes and ensure reproducible deployments. Committing these files allows the Feast CLI to apply incremental updates across environments and enables rollback capabilities. However, ensure that credentials and environment-specific connection strings remain externalized via environment variables rather than stored in this directory.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →