Feast Monitoring and Observability: Prometheus Metrics, OpenTelemetry Traces, and Data Quality Validation
Feast provides a comprehensive observability stack including Prometheus-compatible metrics, OpenTelemetry distributed tracing, and experimental data quality monitoring, all configurable through Helm values and Kubernetes ServiceMonitors.
Feast (feast-dev/feast) is an open-source feature store for machine learning that includes production-grade monitoring and observability capabilities. The platform exposes Prometheus metrics from both the feature server and operator, integrates with OpenTelemetry for distributed tracing, and supports data quality validation through configurable Helm deployments.
Core Observability Components
Prometheus Metrics Collection
Feast exposes Prometheus-compatible metrics via /metrics endpoints on both the feature server and the Feast operator. These endpoints emit critical telemetry including CPU usage, memory consumption, request latency, and feature-retrieval statistics.
The operator’s metrics endpoint is defined in infra/feast-operator/config/prometheus/monitor.yaml, which configures a ServiceMonitor resource that Prometheus uses to discover and scrape the controller manager. For the feature server, the Helm chart includes sample monitoring resources in infra/charts/feast-feature-server/samples/service-monitor.yaml that define how Prometheus should scrape the OpenTelemetry Collector and application metrics.
Key metrics exposed include feast_feature_server_latency_seconds for request timing and feast_feature_server_memory_usage for resource tracking.
Distributed Tracing with OpenTelemetry
Feast integrates with the OpenTelemetry Collector to capture distributed traces and structured logs across the feature retrieval pipeline. The Python SDK supports auto-instrumentation through Kubernetes annotations, requiring no code changes to enable tracing.
When you deploy Feast with the OpenTelemetry Collector, traces are forwarded via OTLP (OpenTelemetry Protocol) to compatible backends such as Jaeger, Zipkin, or Tempo. Configuration details and deployment patterns are documented in docs/getting-started/components/open-telemetry.md.
To enable auto-instrumentation, add the following annotation to your deployment manifest:
metadata:
annotations:
instrumentation.opentelemetry.io/inject-python: "true"
Data Quality Monitoring (Experimental)
For validating training and serving datasets, Feast includes an experimental data quality monitoring system built on Great Expectations. This implementation, located in the dqm/ package, tracks data drift and skew between training sets and live serving features.
The system validates datasets against predefined expectations and surfaces quality metrics that can be consumed by your existing monitoring infrastructure. Reference documentation for this feature is available in docs/reference/dqm.md.
Kubernetes-Native Monitoring Setup
Enabling Metrics in Helm
You activate the observability stack through Helm values when deploying the Feast feature server. The configuration toggles expose the necessary endpoints and configure the OpenTelemetry Collector forwarding address.
Here is a sample values.yaml configuration:
metrics:
enabled: true # Expose Prometheus metrics endpoints
otelCollector:
endpoint: "otel-collector.default.svc.cluster.local:4317"
headers:
api-key: "YOUR_API_KEY" # Optional authentication header
Deploy with the command:
helm install feast-release infra/charts/feast-feature-server \
--set metrics.enabled=true \
--set feature_store_yaml_base64=""
Configuring ServiceMonitors
For Prometheus Operator users, Feast provides sample ServiceMonitor resources that automate metric discovery.
The following configuration from infra/feast-operator/config/prometheus/monitor.yaml sets up monitoring for the Feast operator:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
control-plane: controller-manager
app.kubernetes.io/name: feast-operator
name: controller-manager-metrics-monitor
namespace: system
spec:
endpoints:
- path: /metrics
port: https
scheme: https
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
tlsConfig:
insecureSkipVerify: true
selector:
matchLabels:
control-plane: controller-manager
For the OpenTelemetry Collector itself, use the sample from infra/charts/feast-feature-server/samples/service-monitor.yaml:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: feast
name: otel-sm
spec:
endpoints:
- port: metrics
namespaceSelector:
matchNames:
- <namespace>
selector:
matchLabels:
app.kubernetes.io/component: opentelemetry-collector
app.kubernetes.io/managed-by: opentelemetry-operator
Instrumenting the Python SDK
When you enable the instrumentation.opentelemetry.io/inject-python: "true" annotation on your Feast client pods, the OpenTelemetry Python agent automatically instruments the SDK at runtime. This injects tracing context into feature store requests and exposes additional runtime metrics without requiring changes to your application code.
The instrumentation captures end-to-end request flows from the client through the feature server to the underlying data stores, making it possible to diagnose latency bottlenecks and error sources across the entire feature retrieval path.
Summary
- Prometheus integration: Feast exposes
/metricsendpoints on both the feature server and operator, with ready-to-useServiceMonitordefinitions ininfra/feast-operator/config/prometheus/monitor.yamland the feature server chart samples. - OpenTelemetry support: Distributed tracing and structured logging are available via the OpenTelemetry Collector, configured through Helm values and enabled via Kubernetes pod annotations.
- Data quality validation: Experimental Great Expectations integration in the
dqm/package provides drift and skew detection for training and serving datasets. - Zero-code instrumentation: The Python SDK supports auto-instrumentation through the
instrumentation.opentelemetry.io/inject-pythonannotation, exposing metrics likefeast_feature_server_latency_secondsautomatically.
Frequently Asked Questions
How do I enable Prometheus metrics in Feast?
Set metrics.enabled=true in your Helm values.yaml file when deploying the feature server. This exposes the /metrics endpoint on the feature server and configures the necessary Kubernetes resources for Prometheus scraping. You must also apply the ServiceMonitor resources from infra/feast-operator/config/prometheus/monitor.yaml for the operator and infra/charts/feast-feature-server/samples/service-monitor.yaml for the collector.
What tracing backends does Feast support?
Feast supports any OTLP-compatible backend through the OpenTelemetry Collector. Common implementations include Jaeger, Zipkin, AWS X-Ray, and Grafana Tempo. You configure the backend endpoint using the otelCollector.endpoint Helm value, and traces are automatically forwarded from the instrumented Python SDK.
How does Feast handle data quality monitoring?
Feast includes an experimental data quality monitoring system in the dqm/ package that integrates with Great Expectations. This system validates datasets against predefined expectations and tracks statistical drift between training data and live serving features. Documentation is available in docs/reference/dqm.md, though this feature is not yet considered production-stable.
Is OpenTelemetry integration mandatory for Feast monitoring?
No, OpenTelemetry is optional. You can run Feast with only Prometheus metrics enabled by setting metrics.enabled=true while omitting the otelCollector configuration. However, enabling both provides the most complete observability coverage, correlating metric spikes with distributed trace data for faster incident resolution.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →