Feast Monitoring and Observability: Prometheus Metrics, OpenTelemetry Traces, and Data Quality Validation

Feast provides a comprehensive observability stack including Prometheus-compatible metrics, OpenTelemetry distributed tracing, and experimental data quality monitoring, all configurable through Helm values and Kubernetes ServiceMonitors.

Feast (feast-dev/feast) is an open-source feature store for machine learning that includes production-grade monitoring and observability capabilities. The platform exposes Prometheus metrics from both the feature server and operator, integrates with OpenTelemetry for distributed tracing, and supports data quality validation through configurable Helm deployments.

Core Observability Components

Prometheus Metrics Collection

Feast exposes Prometheus-compatible metrics via /metrics endpoints on both the feature server and the Feast operator. These endpoints emit critical telemetry including CPU usage, memory consumption, request latency, and feature-retrieval statistics.

The operator’s metrics endpoint is defined in infra/feast-operator/config/prometheus/monitor.yaml, which configures a ServiceMonitor resource that Prometheus uses to discover and scrape the controller manager. For the feature server, the Helm chart includes sample monitoring resources in infra/charts/feast-feature-server/samples/service-monitor.yaml that define how Prometheus should scrape the OpenTelemetry Collector and application metrics.

Key metrics exposed include feast_feature_server_latency_seconds for request timing and feast_feature_server_memory_usage for resource tracking.

Distributed Tracing with OpenTelemetry

Feast integrates with the OpenTelemetry Collector to capture distributed traces and structured logs across the feature retrieval pipeline. The Python SDK supports auto-instrumentation through Kubernetes annotations, requiring no code changes to enable tracing.

When you deploy Feast with the OpenTelemetry Collector, traces are forwarded via OTLP (OpenTelemetry Protocol) to compatible backends such as Jaeger, Zipkin, or Tempo. Configuration details and deployment patterns are documented in docs/getting-started/components/open-telemetry.md.

To enable auto-instrumentation, add the following annotation to your deployment manifest:

metadata:
  annotations:
    instrumentation.opentelemetry.io/inject-python: "true"

Data Quality Monitoring (Experimental)

For validating training and serving datasets, Feast includes an experimental data quality monitoring system built on Great Expectations. This implementation, located in the dqm/ package, tracks data drift and skew between training sets and live serving features.

The system validates datasets against predefined expectations and surfaces quality metrics that can be consumed by your existing monitoring infrastructure. Reference documentation for this feature is available in docs/reference/dqm.md.

Kubernetes-Native Monitoring Setup

Enabling Metrics in Helm

You activate the observability stack through Helm values when deploying the Feast feature server. The configuration toggles expose the necessary endpoints and configure the OpenTelemetry Collector forwarding address.

Here is a sample values.yaml configuration:

metrics:
  enabled: true                # Expose Prometheus metrics endpoints

  otelCollector:
    endpoint: "otel-collector.default.svc.cluster.local:4317"
    headers:
      api-key: "YOUR_API_KEY"  # Optional authentication header

Deploy with the command:

helm install feast-release infra/charts/feast-feature-server \
  --set metrics.enabled=true \
  --set feature_store_yaml_base64=""

Configuring ServiceMonitors

For Prometheus Operator users, Feast provides sample ServiceMonitor resources that automate metric discovery.

The following configuration from infra/feast-operator/config/prometheus/monitor.yaml sets up monitoring for the Feast operator:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    control-plane: controller-manager
    app.kubernetes.io/name: feast-operator
  name: controller-manager-metrics-monitor
  namespace: system
spec:
  endpoints:
    - path: /metrics
      port: https
      scheme: https
      bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
      tlsConfig:
        insecureSkipVerify: true
  selector:
    matchLabels:
      control-plane: controller-manager

For the OpenTelemetry Collector itself, use the sample from infra/charts/feast-feature-server/samples/service-monitor.yaml:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app: feast
  name: otel-sm
spec:
  endpoints:
    - port: metrics
  namespaceSelector:
    matchNames:
      - <namespace>
  selector:
    matchLabels:
      app.kubernetes.io/component: opentelemetry-collector
      app.kubernetes.io/managed-by: opentelemetry-operator

Instrumenting the Python SDK

When you enable the instrumentation.opentelemetry.io/inject-python: "true" annotation on your Feast client pods, the OpenTelemetry Python agent automatically instruments the SDK at runtime. This injects tracing context into feature store requests and exposes additional runtime metrics without requiring changes to your application code.

The instrumentation captures end-to-end request flows from the client through the feature server to the underlying data stores, making it possible to diagnose latency bottlenecks and error sources across the entire feature retrieval path.

Summary

  • Prometheus integration: Feast exposes /metrics endpoints on both the feature server and operator, with ready-to-use ServiceMonitor definitions in infra/feast-operator/config/prometheus/monitor.yaml and the feature server chart samples.
  • OpenTelemetry support: Distributed tracing and structured logging are available via the OpenTelemetry Collector, configured through Helm values and enabled via Kubernetes pod annotations.
  • Data quality validation: Experimental Great Expectations integration in the dqm/ package provides drift and skew detection for training and serving datasets.
  • Zero-code instrumentation: The Python SDK supports auto-instrumentation through the instrumentation.opentelemetry.io/inject-python annotation, exposing metrics like feast_feature_server_latency_seconds automatically.

Frequently Asked Questions

How do I enable Prometheus metrics in Feast?

Set metrics.enabled=true in your Helm values.yaml file when deploying the feature server. This exposes the /metrics endpoint on the feature server and configures the necessary Kubernetes resources for Prometheus scraping. You must also apply the ServiceMonitor resources from infra/feast-operator/config/prometheus/monitor.yaml for the operator and infra/charts/feast-feature-server/samples/service-monitor.yaml for the collector.

What tracing backends does Feast support?

Feast supports any OTLP-compatible backend through the OpenTelemetry Collector. Common implementations include Jaeger, Zipkin, AWS X-Ray, and Grafana Tempo. You configure the backend endpoint using the otelCollector.endpoint Helm value, and traces are automatically forwarded from the instrumented Python SDK.

How does Feast handle data quality monitoring?

Feast includes an experimental data quality monitoring system in the dqm/ package that integrates with Great Expectations. This system validates datasets against predefined expectations and tracks statistical drift between training data and live serving features. Documentation is available in docs/reference/dqm.md, though this feature is not yet considered production-stable.

Is OpenTelemetry integration mandatory for Feast monitoring?

No, OpenTelemetry is optional. You can run Feast with only Prometheus metrics enabled by setting metrics.enabled=true while omitting the otelCollector configuration. However, enabling both provides the most complete observability coverage, correlating metric spikes with distributed trace data for faster incident resolution.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →