# How to Set Up High Availability for the Feast Registry

> Learn how to set up high availability for the Feast registry. Configure PostgreSQL or MySQL, deploy replicas with the Feast Operator, and use a Kubernetes ClusterIP Service for load balancing.

- Repository: [Feast/feast](https://github.com/feast-dev/feast)
- Tags: how-to-guide
- Published: 2026-03-01

---

**To achieve high availability for the Feast registry, configure a SQL-backed registry using PostgreSQL or MySQL, deploy multiple replicas via the Feast Operator's FeatureStore custom resource, and expose them through a Kubernetes ClusterIP Service that load-balances requests across all pods.**

The `feast-dev/feast` repository provides a feature store platform where the registry maintains critical metadata about entities, data sources, and feature views. Setting up high availability for the Feast registry ensures your feature store remains operational during node failures, rolling updates, or traffic spikes, eliminating the single-point-of-failure risk inherent in the default file-based implementation.

## Why the Default Registry Cannot Scale

By default, Feast uses a single-file implementation (`registry.db`) that operates as a **single-writer** system. This file-based approach (SQLite or DuckDB) requires exclusive access for writes and cannot support concurrent modifications across multiple instances. Attempting to replicate this architecture results in race conditions and data corruption, making it unsuitable for production high-availability deployments.

## High Availability Architecture Components

Achieving true high availability requires three coordinated components: a transactional database backend, a multi-replica Kubernetes deployment, and a stable service endpoint.

### SQL Registry Backend

The foundation of an HA deployment is the **SQL Registry**, which persists metadata in a relational database rather than a local file. According to [`docs/reference/registries/sql.md`](https://github.com/feast-dev/feast/blob/main/docs/reference/registries/sql.md), this backend supports atomic writes from any replica and eliminates the "rewrite-whole-file" bottleneck. Any SQL-Alchemy-supported database (PostgreSQL, MySQL) provides the necessary transactional consistency for concurrent access.

The implementation in [`go/internal/feast/registry/mysql_registry_store.go`](https://github.com/feast-dev/feast/blob/main/go/internal/feast/registry/mysql_registry_store.go) (and similar database-specific stores) handles concurrent persistence, ensuring that multiple registry pods can safely read and write metadata simultaneously without conflicts.

### Multi-Replica Deployment with the Feast Operator

The **Feast Operator** manages registry scaling through the `FeatureStore` custom resource (CR). When you specify `spec.replicas` greater than 1, the operator automatically:

1. Generates a Kubernetes Deployment with the requested replica count
2. Sets the strategy to `RollingUpdate` to prevent downtime during version changes
3. Creates a `HorizontalPodAutoscaler` (HPA) if `spec.services.scaling.autoscaling` is configured

As documented in [`infra/website/docs/blog/scaling-feast-feature-server.md`](https://github.com/feast-dev/feast/blob/main/infra/website/docs/blog/scaling-feast-feature-server.md), the operator enforces persistence validation at admission time. If you attempt to enable `replicas > 1` while using a file-based backend, the request is rejected immediately with a clear error message.

### Kubernetes Service Load Balancing

The operator creates a **ClusterIP Service** named `feast-registry` that provides a stable DNS endpoint (`feast-registry.<namespace>.svc`). This service load-balances gRPC and REST traffic across all registry pods. Clients using the Python SDK ([`sdk/python/feast/infra/registry/registry.py`](https://github.com/feast-dev/feast/blob/main/sdk/python/feast/infra/registry/registry.py)) connect to this service name transparently, requiring no code changes when scaling replica counts up or down.

## Step-by-Step Configuration

Follow these steps to transition from a single-file registry to a highly available deployment.

### Step 1: Configure the SQL Registry Backend

Update your [`feature_store.yaml`](https://github.com/feast-dev/feast/blob/main/feature_store.yaml) to use a database-backed registry instead of a local file:

```yaml
project: my_project
provider: aws
online_store: redis
offline_store: file

registry:
  registry_type: sql
  path: postgresql://postgres:mysecret@db-host:5432/feast
  cache_ttl_seconds: 60
  sqlalchemy_config_kwargs:
    echo: false
    pool_pre_ping: true

```

The `path` parameter accepts any SQLAlchemy-compatible connection string. The `pool_pre_ping: true` setting ensures connections are validated before use, preventing errors during database failover events.

### Step 2: Deploy Static Replicas

Create a `FeatureStore` custom resource with a fixed replica count for immediate high availability:

```yaml
apiVersion: feast.dev/v1
kind: FeatureStore
metadata:
  name: prod-feast
spec:
  feastProject: my_project
  replicas: 3
  services:
    onlineStore:
      persistence:
        store:
          type: postgres
          secretRef:
            name: feast-data-stores
    registry:
      local:
        persistence:
          store:
            type: sql
            secretRef:
              name: feast-data-stores

```

The `replicas: 3` field tells the operator to maintain three registry pods simultaneously. The `local` service type indicates the operator manages the registry server deployment directly, as described in [`infra/feast-operator/docs/api/markdown/ref.md`](https://github.com/feast-dev/feast/blob/main/infra/feast-operator/docs/api/markdown/ref.md).

### Step 3: Enable Autoscaling for Dynamic HA

For environments with variable load, configure the **HorizontalPodAutoscaler** instead of static replicas:

```yaml
apiVersion: feast.dev/v1
kind: FeatureStore
metadata:
  name: autoscaled-feast
spec:
  feastProject: my_project
  services:
    scaling:
      autoscaling:
        minReplicas: 2
        maxReplicas: 10
        metrics:
          - type: Resource
            resource:
              name: cpu
              target:
                type: Utilization
                averageUtilization: 70
    onlineStore:
      persistence:
        store:
          type: postgres
          secretRef:
            name: feast-data-stores
    registry:
      local:
        persistence:
          store:
            type: sql
            secretRef:
              name: feast-data-stores

```

This configuration maintains a minimum of two replicas for baseline availability while scaling up to ten pods based on CPU utilization. The HPA ensures you maintain quorum during traffic spikes without over-provisioning resources during quiet periods.

## Validation and Safety Mechanisms

The Feast Operator implements admission-time validation to prevent misconfigurations. If you specify `replicas > 1` or enable autoscaling while using a file-based persistence layer (SQLite, DuckDB, or local `registry.db`), the operator rejects the CR with a validation error referencing the safety check in [`infra/feast-operator/docs/api/markdown/ref.md`](https://github.com/feast-dev/feast/blob/main/infra/feast-operator/docs/api/markdown/ref.md).

Additionally, the `RollingUpdate` deployment strategy ensures zero-downtime deployments. When updating the registry image or configuration, Kubernetes terminates old pods only after new pods pass health checks, maintaining continuous availability for SDK and UI clients.

## Summary

- **Use a SQL-backed registry** (PostgreSQL/MySQL) to enable concurrent writes and eliminate the single-writer limitation of file-based storage.
- **Deploy multiple replicas** via the Feast Operator's `FeatureStore` CR using either static `replicas` counts or HPA autoscaling for dynamic scaling.
- **Rely on the ClusterIP Service** (`feast-registry`) for transparent load balancing across all registry pods without client reconfiguration.
- **Validate persistence compatibility** before deployment—the operator blocks HA configurations with incompatible file-based backends to prevent data corruption.

## Frequently Asked Questions

### Can I use S3 or GCS for a highly available registry?

Cloud object stores such as **S3** and **GCS** support concurrent readers and can serve as registry storage, but they do not support concurrent atomic writes required for multi-writer HA deployments. For true high availability with multiple registry replicas, you must use a SQL database backend (PostgreSQL, MySQL) that supports transactional consistency and concurrent modifications.

### Why can't I use SQLite or DuckDB for HA deployments?

SQLite and DuckDB implementations rely on file-based storage that requires exclusive locks for writes. When multiple registry pods attempt to modify the metadata simultaneously, these backends cannot reconcile concurrent changes, leading to race conditions and potential data corruption. The Feast Operator explicitly blocks `replicas > 1` configurations when detecting these storage types.

### How does the Feast Operator prevent downtime during updates?

The operator configures the registry Deployment with a `RollingUpdate` strategy that creates new pods before terminating old ones. Health checks ensure new replicas are ready to serve traffic before the controller removes legacy pods. This approach, combined with the ClusterIP Service maintaining active connections, ensures zero-downtime deployments even when updating registry versions or configuration.

### Do clients need to change configuration when scaling registry replicas?

No. Clients—including the Python SDK ([`sdk/python/feast/infra/registry/registry.py`](https://github.com/feast-dev/feast/blob/main/sdk/python/feast/infra/registry/registry.py)), Feast UI, and other services—continue using the same `registry` endpoint defined in [`feature_store.yaml`](https://github.com/feast-dev/feast/blob/main/feature_store.yaml). The Kubernetes Service (`feast-registry.<namespace>.svc`) abstracts the underlying pod topology, automatically routing requests to healthy replicas regardless of the current replica count or autoscaling state.