How to Monitor and Achieve Observability for AI Applications
To monitor and achieve observability for AI applications, instrument your Generative AI services with OpenTelemetry to capture distributed traces, metrics, and logs, then export telemetry to Azure Monitor or Application Insights to enable alerting, dashboards, and continuous improvement loops.
The microsoft/generative-ai-for-beginners curriculum treats observability as a first-class requirement for production Generative AI (GenAI) systems, not an afterthought. This guide distills the repository’s implementation patterns to help you build reliable, responsible, and cost-effective AI applications using concrete code examples and architectural blueprints from the source lessons.
Why Observability Is Essential for GenAI
The Generative AI Application Lifecycle explicitly mandates that you must "monitor, evaluate, and improve it continuously" to maintain production quality (source). During the Operationalizing phase, the curriculum instructs developers to "add Monitoring and Alerts Systems to our system" before considering deployment complete (source).
Furthermore, the Building Chat Applications lesson lists monitoring as a critical requirement "to ensure the applications are operating at the highest level of quality" and ties it directly to Retraining Cycles—a key metric that triggers model updates when drift or degradation is detected (source, L58-L61). Without these signals, you cannot close the feedback loop between production behavior and model improvement.
Core Observability Pillars
The repository structures observability around four telemetry types. The following table maps each pillar to GenAI-specific implementations and Azure tooling referenced in the curriculum:
| Pillar | What to Track | Recommended Tools |
|---|---|---|
| Metrics | Request latency, token count (input/output), throughput, error rates, model-specific scores (perplexity, F1) | Azure Monitor Metrics, Prometheus, OpenTelemetry Metrics |
| Logs | API request/response payloads (redacted/sanitized), authentication events, retry attempts, custom business events | Azure Log Analytics, Python logging, OpenTelemetry Logs |
| Traces | End-to-end request flow across services (Client → API Gateway → Orchestrator → LLM inference) | Azure Application Insights, OpenTelemetry Tracing |
| Alerts | SLA breaches, cost-per-token spikes, abnormal error patterns, drift in response quality | Azure Monitor Alerts, Grafana, Webhook triggers |
Architectural Blueprint
A typical GenAI application flow instruments every tier to emit telemetry to a centralized observability backend:
flowchart LR
A[Client UI] --> B[API Gateway]
B --> C[PromptFlow / Orchestrator]
C --> D[LLM Inference<br/>Azure OpenAI / GitHub Models]
D --> C
C --> B
B --> A
subgraph Observability Layer
E[Metrics Collector<br/>OpenTelemetry SDK]
F[Log Exporter<br/>Azure Log Analytics]
G[Trace Exporter<br/>Application Insights]
H[Alert Engine<br/>Azure Monitor]
end
D --> E
D --> F
D --> G
E --> H
F --> H
G --> H
- Instrumentation – Embed OpenTelemetry SDKs in your orchestration layer (PromptFlow, Python FastAPI, or TypeScript services).
- Export – Route telemetry to Azure Monitor (metrics), Log Analytics (structured logs), and Application Insights (distributed traces).
- Alerting – Define thresholds on latency, error-rate, and cost-per-token to trigger notifications or CI/CD retraining pipelines.
Implementing Observability in Code
The repository provides language-specific patterns for instrumenting OpenAI clients and orchestration frameworks.
Python: OpenTelemetry with Azure Monitor
Create an instrumentation.py module to configure the OpenTelemetry SDK and auto-instrument the OpenAI client:
# instrumentation.py
import os
import logging
from opentelemetry import trace, metrics
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.instrumentation.openai import OpenAIInstrumentor
from opentelemetry.exporter.azuremonitor import (
AzureMonitorMetricExporter,
AzureMonitorTraceExporter
)
# Configure resource attributes (service name, version, environment)
resource = Resource.create({
"service.name": "genai-chat-app",
"service.version": "1.0.0",
"deployment.environment": os.getenv("ENV", "dev"),
})
# Initialize Tracing
trace.set_tracer_provider(TracerProvider(resource=resource))
tracer = trace.get_tracer(__name__)
trace_exporter = AzureMonitorTraceExporter(
connection_string=os.getenv("AZURE_MONITOR_CONNECTION_STRING")
)
trace.get_tracer_provider().add_span_processor(
BatchSpanProcessor(trace_exporter)
)
# Initialize Metrics
metric_reader = PeriodicExportingMetricReader(
AzureMonitorMetricExporter(
connection_string=os.getenv("AZURE_MONITOR_CONNECTION_STRING")
)
)
metrics.set_meter_provider(MeterProvider(
resource=resource,
metric_readers=[metric_reader]
))
meter = metrics.get_meter(__name__)
# Custom metric: tokens per request
tokens_counter = meter.create_counter(
name="genai.tokens_sent",
description="Number of tokens sent to the LLM per request",
unit="tokens",
)
# Structured logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("genai")
# Auto-instrument OpenAI
OpenAIInstrumentor().instrument()
Then instrument your business logic in chat_service.py:
# chat_service.py
import os
from openai import OpenAI
from instrumentation import tokens_counter, logger
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
def chat(prompt: str, user_id: str):
logger.info("Chat request", extra={
"user_id": user_id,
"prompt_len": len(prompt)
})
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
)
# Record token usage
usage = response.usage
tokens_counter.add(
usage.total_tokens,
{"model": "gpt-4o-mini", "user_id": user_id}
)
logger.info("Response generated", extra={
"user_id": user_id,
"tokens": usage.total_tokens
})
return response.choices[0].message.content
Running this service streams traces, metrics, and logs to Azure Monitor, enabling dashboard creation and threshold alerting on token costs and latency.
TypeScript: Azure Application Insights
For Node.js services, use the Application Insights SDK to capture distributed traces and custom telemetry:
// obs.ts
import { ApplicationInsights } from "@microsoft/applicationinsights-web";
import { OpenAI } from "openai";
const appInsights = new ApplicationInsights({
config: {
connectionString: process.env.AZURE_APP_INSIGHTS_CONNECTION_STRING!,
enableAutoRouteTracking: true,
},
});
appInsights.loadAppInsights();
export async function chat(prompt: string, userId: string) {
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! });
const start = Date.now();
const response = await client.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: prompt }],
});
const duration = Date.now() - start;
// Custom telemetry
appInsights.trackEvent({
name: "ChatRequest",
properties: { userId, promptLength: prompt.length },
});
appInsights.trackMetric({
name: "ChatLatencyMs",
value: duration,
});
appInsights.trackMetric({
name: "TokensUsed",
value: response.usage?.total_tokens ?? 0,
});
return response.choices[0].message.content;
}
Application Insights automatically captures HTTP dependencies and distributed traces, while the custom events feed the Metrics and Alerts panes of Azure Monitor.
PromptFlow: Built-in Telemetry
When using PromptFlow (Azure AI Studio), apply the @monitor decorator to auto-emit run-level metrics without manual instrumentation:
from promptflow import tool, monitor
@tool
@monitor
def generate_answer(context: str, question: str) -> str:
# PromptFlow automatically logs:
# - Node latency
# - Token usage (input/output)
# - Exception stack traces
return llm_client.chat(context, question)
Executing this flow in Azure AI Studio exposes Run Metrics and allows you to set alerts on cost-per-run or node-level failures.
Responsible AI Observability
The repository links observability directly to the six principles of Responsible AI (fairness, reliability, privacy, inclusiveness, transparency, accountability) by mapping each principle to concrete telemetry checks (source):
| Principle | Observable Guardrail |
|---|---|
| Fairness | Log demographic parity metrics per request batch; alert on disparate impact scores |
| Reliability & Safety | Error-rate thresholds and anomaly detection on response latency/outliers |
| Privacy & Security | Audit logs of data-access events; automatic PII redaction in request logs |
| Transparency | Log model version ID and prompt template hash with every response |
| Accountability | End-to-end trace IDs linking user feedback (thumbs up/down) to specific model versions for retraining attribution |
Summary
- Observability is a lifecycle requirement defined in
14-the-generative-ai-application-lifecycle/README.md, not an optional add-on. - Instrument with OpenTelemetry to capture the three pillars—metrics, logs, and traces—from Python, TypeScript, or PromptFlow runtimes.
- Export to Azure Monitor to centralize telemetry and configure alerts on latency, error rates, and cost-per-token.
- Close the feedback loop by connecting alerts to retraining pipelines, fulfilling the Retraining Cycles metric specified in
07-building-chat-applications/README.md. - Validate Responsible AI principles through specific telemetry checks for fairness, privacy, and transparency.
Frequently Asked Questions
What are the three pillars of observability for AI applications?
The three pillars are metrics (quantitative performance indicators like latency and token count), logs (structured event records including API requests and errors), and traces (distributed request flows across microservices). The microsoft/generative-ai-for-beginners curriculum emphasizes tracking these via OpenTelemetry to maintain production quality.
How does OpenTelemetry help monitor Generative AI specifically?
OpenTelemetry provides vendor-neutral instrumentation that auto-captures LLM-specific signals such as token usage, model version identifiers, and prompt-response pairs. As shown in the repository’s Python examples, the OpenAIInstrumentor automatically wraps OpenAI client calls to emit spans and metrics without manual boilerplate.
Which metrics should I alert on for cost control?
Alert on cost-per-token aggregates, request latency percentiles (p95/p99), and error-rate spikes. The curriculum specifically highlights monitoring Retraining Cycles and Token Count to prevent budget overruns and detect model drift early.
How do I implement Responsible AI monitoring in production?
Map each Responsible AI principle to a telemetry rule: log model versions for transparency, track demographic parity metrics for fairness, and maintain audit logs of data access for accountability. Configure Azure Monitor alerts to fire when these guardrails breach defined thresholds, ensuring continuous compliance.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →