# Feast Batch Processing Engines: Local, Spark, Ray, Snowflake, and AWS Lambda

> Explore Feast batch processing engines: local, Spark, Ray, Snowflake, and AWS Lambda. Materialize features efficiently for your data science projects.

- Repository: [Feast/feast](https://github.com/feast-dev/feast)
- Tags: deep-dive
- Published: 2026-03-01

---

**Feast supports five production-ready batch compute engines—local, Spark, Ray, Snowflake, and AWS Lambda—that materialize features via the `batch_engine` field in [`feature_store.yaml`](https://github.com/feast-dev/feast/blob/main/feature_store.yaml), with engine resolution handled through the `BATCH_ENGINE_CLASS_FOR_TYPE` registry in [`sdk/python/feast/repo_config.py`](https://github.com/feast-dev/feast/blob/main/sdk/python/feast/repo_config.py).**

Feast materializes offline feature data to online stores using a pluggable **batch materialization engine** architecture. The open-source `feast-dev/feast` repository provides built-in compute engines that scale from local development to distributed serverless environments. You configure your preferred engine through the `batch_engine` (or `batch_engine_config`) field in [`feature_store.yaml`](https://github.com/feast-dev/feast/blob/main/feature_store.yaml), which Feast resolves to concrete implementations via an internal class registry defined in [`sdk/python/feast/repo_config.py`](https://github.com/feast-dev/feast/blob/main/sdk/python/feast/repo_config.py).

## How Feast Selects Batch Compute Engines

Feast uses a registry-based lookup mechanism to map shorthand engine names to concrete compute classes. In [`sdk/python/feast/repo_config.py`](https://github.com/feast-dev/feast/blob/main/sdk/python/feast/repo_config.py) (lines 46‑53), the `BATCH_ENGINE_CLASS_FOR_TYPE` dictionary defines the following mappings:

- `local` → `feast.infra.compute_engines.local.compute.LocalComputeEngine`
- `spark.engine` → `feast.infra.compute_engines.spark.compute.SparkComputeEngine`
- `ray.engine` → `feast.infra.compute_engines.ray.compute.RayComputeEngine`
- `snowflake.engine` → `feast.infra.compute_engines.snowflake.snowflake_engine.SnowflakeComputeEngine`
- `lambda` → `feast.infra.compute_engines.aws_lambda.lambda_engine.LambdaComputeEngine`

When the `FeatureStore` initializes, it calls `get_batch_engine_config_from_type()` to validate the configuration against the corresponding `*EngineConfig` Pydantic model (e.g., `SparkEngineConfig`). If no `batch_engine` is specified, `RepoConfig.__init__` (lines 42‑45) defaults to the `local` engine, ensuring that development environments work out-of-the-box.

## Local Compute Engine

The **Local Compute Engine** runs materialization in-process on the driver machine using `feast.infra.compute_engines.local.compute.LocalComputeEngine`. This engine is ideal for development, unit testing, and small-scale workloads that do not require distributed processing.

Because the engine executes within the same Python process as the Feast client, it avoids the overhead of cluster scheduling and network serialization. The engine implements the abstract `ComputeEngine` interface defined in [`feast/infra/compute_engines/base.py`](https://github.com/feast-dev/feast/blob/main/feast/infra/compute_engines/base.py), specifically overriding the `materialize` method to iterate over batch sources locally.

## Spark Compute Engine

The **Spark Compute Engine** (`spark.engine`) leverages `feast.infra.compute_engines.spark.compute.SparkComputeEngine` to execute materialization as a distributed Spark job. This engine supports stand-alone Spark clusters, Amazon EMR, Google Dataproc, and Databricks runtimes.

Spark engine configuration accepts a `spark_conf` dictionary that passes directly to the Spark session builder, allowing you to set the master URL, deploy mode, and application-specific properties. The engine reads the batch source into a Spark DataFrame, applies any on-demand transformations, and writes the results to the configured online store in parallel.

## Ray Compute Engine

The **Ray Compute Engine** (`ray.engine`) utilizes `feast.infra.compute_engines.ray.compute.RayComputeEngine` to parallelize materialization across a Ray cluster. This provides fine-grained task scheduling without the overhead of a full Spark stack, making it suitable for Python-native feature transformations.

Configuration requires a `ray_address` parameter (e.g., `ray://my-ray-head:10001`) and optional runtime environment specifications. The Ray engine distributes the materialization workload across available cluster nodes, using Ray’s actor and task primitives to scale horizontally while maintaining low-latency Python execution.

## Snowflake Compute Engine

The **Snowflake Compute Engine** (`snowflake.engine`) uses `feast.infra.compute_engines.snowflake.snowflake_engine.SnowflakeComputeEngine` to push computation directly into Snowflake’s native compute layer. Instead of exporting data to an external engine, Feast generates SQL queries that execute within Snowflake warehouses.

This approach minimizes data movement and leverages Snowflake’s elastic compute resources. The engine requires `warehouse` and `role` parameters in its configuration, and it writes materialized features directly from Snowflake tables to the online store without intermediate extraction.

## AWS Lambda Compute Engine

The **AWS Lambda Compute Engine** (`lambda`) invokes `feast.infra.compute_engines.aws_lambda.lambda_engine.LambdaComputeEngine` to execute materialization as a serverless function. This is optimal for lightweight, event-driven batch jobs that run infrequently or require automatic scaling without persistent infrastructure.

The engine accepts `function_name`, `region`, and optional payload size or timeout tuning parameters. When materialization triggers, Feast packages the batch job context and invokes the specified Lambda function, which performs the data extraction and online store write within the AWS ecosystem.

## Configuration Examples

Below are minimal [`feature_store.yaml`](https://github.com/feast-dev/feast/blob/main/feature_store.yaml) configurations for each engine. All examples assume a `FileSource` batch source; adjust the `offline_store` configuration to match your environment (BigQuery, Snowflake, Dask, etc.).

### Local Engine Configuration

```yaml
project: my_project
provider: local
registry: data/registry.db
online_store:
  type: sqlite
offline_store:
  type: dask
batch_engine: local

```

### Spark Engine Configuration

```yaml
project: my_project
provider: aws
registry: s3://my-bucket/registry.db
online_store:
  type: redis
  path: redis://localhost:6379
offline_store:
  type: dask
batch_engine:
  type: spark.engine
  spark_conf:
    master: "spark://my-spark-cluster:7077"
    deploy_mode: "cluster"

```

### Ray Engine Configuration

```yaml
project: my_project
provider: gcp
registry: gs://my-bucket/registry.db
online_store:
  type: bigtable
offline_store:
  type: dask
batch_engine:
  type: ray.engine
  ray_address: "ray://my-ray-head:10001"

```

### Snowflake Engine Configuration

```yaml
project: my_project
provider: snowflake
registry: s3://my-bucket/registry.db
online_store:
  type: snowflake.online
  account: "<account>"
  user: "<user>"
  password: "<pwd>"
offline_store:
  type: dask
batch_engine:
  type: snowflake.engine
  warehouse: "FEAST_WH"
  role: "FEAST_ROLE"

```

### AWS Lambda Engine Configuration

```yaml
project: my_project
provider: aws
registry: s3://my-bucket/registry.db
online_store:
  type: dynamodb
offline_store:
  type: dask
batch_engine:
  type: lambda
  function_name: "feast-materialize"
  region: "us-east-1"

```

## Summary

- Feast provides five built-in **batch compute engines**: `local`, `spark.engine`, `ray.engine`, `snowflake.engine`, and `lambda`, mapped through `BATCH_ENGINE_CLASS_FOR_TYPE` in [`sdk/python/feast/repo_config.py`](https://github.com/feast-dev/feast/blob/main/sdk/python/feast/repo_config.py).
- The **Local** engine runs in-process for development, while **Spark** and **Ray** provide distributed processing for large-scale materialization.
- The **Snowflake** engine executes queries natively inside Snowflake warehouses, minimizing data transfer overhead.
- The **Lambda** engine enables serverless materialization for lightweight, event-driven workloads.
- All engines implement the `ComputeEngine` interface from [`feast/infra/compute_engines/base.py`](https://github.com/feast-dev/feast/blob/main/feast/infra/compute_engines/base.py), ensuring a consistent `materialize` method signature across implementations.

## Frequently Asked Questions

### What is the default batch compute engine in Feast?

If you omit the `batch_engine` field in [`feature_store.yaml`](https://github.com/feast-dev/feast/blob/main/feature_store.yaml), Feast defaults to the `local` engine. This behavior is defined in `RepoConfig.__init__` within [`sdk/python/feast/repo_config.py`](https://github.com/feast-dev/feast/blob/main/sdk/python/feast/repo_config.py) (lines 42‑45), which assigns the local compute engine class when no engine type is specified.

### How do I add a custom batch compute engine to Feast?

You can extend the pluggable registry by adding a new entry to `BATCH_ENGINE_CLASS_FOR_TYPE` in [`sdk/python/feast/repo_config.py`](https://github.com/feast-dev/feast/blob/main/sdk/python/feast/repo_config.py), mapping a custom type string to a fully-qualified class path. Your custom class must inherit from the abstract `ComputeEngine` base class in [`feast/infra/compute_engines/base.py`](https://github.com/feast-dev/feast/blob/main/feast/infra/compute_engines/base.py) and implement the `materialize` method signature.

### What are the performance differences between Ray and Spark engines in Feast?

The **Ray** engine excels at fine-grained, Python-native task parallelism with lower overhead for small-to-medium datasets, while the **Spark** engine optimizes for large-scale JVM-based data processing with robust fault tolerance. Choose Ray for Python-centric feature logic requiring dynamic scaling, and Spark for massive ETL workloads that benefit from Spark SQL and mature cluster management.

### Can the Snowflake engine materialize features from non-Snowflake sources?

No, the `snowflake.engine` is designed specifically to execute materialization queries within Snowflake’s compute layer against Snowflake-hosted batch sources. If your offline store is BigQuery or Redshift, you should use the `local`, `spark.engine`, or `ray.engine` compute engines instead, as these can read from diverse offline stores and write to any supported online store.