how-to-guide

How to Monitor and Achieve Observability for AI Applications

February 26, 2026 microsoft/generative-ai-for-beginners ↗

To monitor and achieve observability for AI applications, instrument your Generative AI services with OpenTelemetry to capture distributed traces, metrics, and logs, then export telemetry to Azure Monitor or Application Insights to enable alerting, dashboards, and continuous improvement loops.

The microsoft/generative-ai-for-beginners curriculum treats observability as a first-class requirement for production Generative AI (GenAI) systems, not an afterthought. This guide distills the repository’s implementation patterns to help you build reliable, responsible, and cost-effective AI applications using concrete code examples and architectural blueprints from the source lessons.

Why Observability Is Essential for GenAI

The Generative AI Application Lifecycle explicitly mandates that you must "monitor, evaluate, and improve it continuously" to maintain production quality (source). During the Operationalizing phase, the curriculum instructs developers to "add Monitoring and Alerts Systems to our system" before considering deployment complete (source).

Furthermore, the Building Chat Applications lesson lists monitoring as a critical requirement "to ensure the applications are operating at the highest level of quality" and ties it directly to Retraining Cycles—a key metric that triggers model updates when drift or degradation is detected (source, L58-L61). Without these signals, you cannot close the feedback loop between production behavior and model improvement.

Core Observability Pillars

The repository structures observability around four telemetry types. The following table maps each pillar to GenAI-specific implementations and Azure tooling referenced in the curriculum:

Pillar	What to Track	Recommended Tools
Metrics	Request latency, token count (input/output), throughput, error rates, model-specific scores (perplexity, F1)	Azure Monitor Metrics, Prometheus, OpenTelemetry Metrics
Logs	API request/response payloads (redacted/sanitized), authentication events, retry attempts, custom business events	Azure Log Analytics, Python `logging`, OpenTelemetry Logs
Traces	End-to-end request flow across services (Client → API Gateway → Orchestrator → LLM inference)	Azure Application Insights, OpenTelemetry Tracing
Alerts	SLA breaches, cost-per-token spikes, abnormal error patterns, drift in response quality	Azure Monitor Alerts, Grafana, Webhook triggers

Architectural Blueprint

A typical GenAI application flow instruments every tier to emit telemetry to a centralized observability backend:

flowchart LR
    A[Client UI] --> B[API Gateway]
    B --> C[PromptFlow / Orchestrator]
    C --> D[LLM Inference<br/>Azure OpenAI / GitHub Models]
    D --> C
    C --> B
    B --> A

    subgraph Observability Layer
        E[Metrics Collector<br/>OpenTelemetry SDK]
        F[Log Exporter<br/>Azure Log Analytics]
        G[Trace Exporter<br/>Application Insights]
        H[Alert Engine<br/>Azure Monitor]
    end

    D --> E
    D --> F
    D --> G
    E --> H
    F --> H
    G --> H

Instrumentation – Embed OpenTelemetry SDKs in your orchestration layer (PromptFlow, Python FastAPI, or TypeScript services).
Export – Route telemetry to Azure Monitor (metrics), Log Analytics (structured logs), and Application Insights (distributed traces).
Alerting – Define thresholds on latency, error-rate, and cost-per-token to trigger notifications or CI/CD retraining pipelines.

Implementing Observability in Code

The repository provides language-specific patterns for instrumenting OpenAI clients and orchestration frameworks.

Python: OpenTelemetry with Azure Monitor

Create an instrumentation.py module to configure the OpenTelemetry SDK and auto-instrument the OpenAI client:


# instrumentation.py

import os
import logging
from opentelemetry import trace, metrics
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.instrumentation.openai import OpenAIInstrumentor
from opentelemetry.exporter.azuremonitor import (
    AzureMonitorMetricExporter, 
    AzureMonitorTraceExporter
)

# Configure resource attributes (service name, version, environment)

resource = Resource.create({
    "service.name": "genai-chat-app",
    "service.version": "1.0.0",
    "deployment.environment": os.getenv("ENV", "dev"),
})

# Initialize Tracing

trace.set_tracer_provider(TracerProvider(resource=resource))
tracer = trace.get_tracer(__name__)
trace_exporter = AzureMonitorTraceExporter(
    connection_string=os.getenv("AZURE_MONITOR_CONNECTION_STRING")
)
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(trace_exporter)
)

# Initialize Metrics

metric_reader = PeriodicExportingMetricReader(
    AzureMonitorMetricExporter(
        connection_string=os.getenv("AZURE_MONITOR_CONNECTION_STRING")
    )
)
metrics.set_meter_provider(MeterProvider(
    resource=resource, 
    metric_readers=[metric_reader]
))
meter = metrics.get_meter(__name__)

# Custom metric: tokens per request

tokens_counter = meter.create_counter(
    name="genai.tokens_sent",
    description="Number of tokens sent to the LLM per request",
    unit="tokens",
)

# Structured logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("genai")

# Auto-instrument OpenAI

OpenAIInstrumentor().instrument()

Then instrument your business logic in chat_service.py:


# chat_service.py

import os
from openai import OpenAI
from instrumentation import tokens_counter, logger

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def chat(prompt: str, user_id: str):
    logger.info("Chat request", extra={
        "user_id": user_id, 
        "prompt_len": len(prompt)
    })
    
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
    )
    
    # Record token usage

    usage = response.usage
    tokens_counter.add(
        usage.total_tokens, 
        {"model": "gpt-4o-mini", "user_id": user_id}
    )
    
    logger.info("Response generated", extra={
        "user_id": user_id, 
        "tokens": usage.total_tokens
    })
    return response.choices[0].message.content

Running this service streams traces, metrics, and logs to Azure Monitor, enabling dashboard creation and threshold alerting on token costs and latency.

TypeScript: Azure Application Insights

For Node.js services, use the Application Insights SDK to capture distributed traces and custom telemetry:

// obs.ts
import { ApplicationInsights } from "@microsoft/applicationinsights-web";
import { OpenAI } from "openai";

const appInsights = new ApplicationInsights({
  config: {
    connectionString: process.env.AZURE_APP_INSIGHTS_CONNECTION_STRING!,
    enableAutoRouteTracking: true,
  },
});
appInsights.loadAppInsights();

export async function chat(prompt: string, userId: string) {
  const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! });

  const start = Date.now();
  const response = await client.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [{ role: "user", content: prompt }],
  });
  const duration = Date.now() - start;

  // Custom telemetry
  appInsights.trackEvent({
    name: "ChatRequest",
    properties: { userId, promptLength: prompt.length },
  });
  appInsights.trackMetric({
    name: "ChatLatencyMs",
    value: duration,
  });
  appInsights.trackMetric({
    name: "TokensUsed",
    value: response.usage?.total_tokens ?? 0,
  });

  return response.choices[0].message.content;
}

Application Insights automatically captures HTTP dependencies and distributed traces, while the custom events feed the Metrics and Alerts panes of Azure Monitor.

PromptFlow: Built-in Telemetry

When using PromptFlow (Azure AI Studio), apply the @monitor decorator to auto-emit run-level metrics without manual instrumentation:

from promptflow import tool, monitor

@tool
@monitor
def generate_answer(context: str, question: str) -> str:
    # PromptFlow automatically logs:

    # - Node latency

    # - Token usage (input/output)

    # - Exception stack traces

    return llm_client.chat(context, question)

Executing this flow in Azure AI Studio exposes Run Metrics and allows you to set alerts on cost-per-run or node-level failures.

Responsible AI Observability

The repository links observability directly to the six principles of Responsible AI (fairness, reliability, privacy, inclusiveness, transparency, accountability) by mapping each principle to concrete telemetry checks (source):

Principle	Observable Guardrail
Fairness	Log demographic parity metrics per request batch; alert on disparate impact scores
Reliability & Safety	Error-rate thresholds and anomaly detection on response latency/outliers
Privacy & Security	Audit logs of data-access events; automatic PII redaction in request logs
Transparency	Log model version ID and prompt template hash with every response
Accountability	End-to-end trace IDs linking user feedback (thumbs up/down) to specific model versions for retraining attribution

Summary

Observability is a lifecycle requirement defined in 14-the-generative-ai-application-lifecycle/README.md, not an optional add-on.
Instrument with OpenTelemetry to capture the three pillars—metrics, logs, and traces—from Python, TypeScript, or PromptFlow runtimes.
Export to Azure Monitor to centralize telemetry and configure alerts on latency, error rates, and cost-per-token.
Close the feedback loop by connecting alerts to retraining pipelines, fulfilling the Retraining Cycles metric specified in 07-building-chat-applications/README.md.
Validate Responsible AI principles through specific telemetry checks for fairness, privacy, and transparency.

Frequently Asked Questions

What are the three pillars of observability for AI applications?

The three pillars are metrics (quantitative performance indicators like latency and token count), logs (structured event records including API requests and errors), and traces (distributed request flows across microservices). The microsoft/generative-ai-for-beginners curriculum emphasizes tracking these via OpenTelemetry to maintain production quality.

How does OpenTelemetry help monitor Generative AI specifically?

OpenTelemetry provides vendor-neutral instrumentation that auto-captures LLM-specific signals such as token usage, model version identifiers, and prompt-response pairs. As shown in the repository’s Python examples, the OpenAIInstrumentor automatically wraps OpenAI client calls to emit spans and metrics without manual boilerplate.

Which metrics should I alert on for cost control?

Alert on cost-per-token aggregates, request latency percentiles (p95/p99), and error-rate spikes. The curriculum specifically highlights monitoring Retraining Cycles and Token Count to prevent budget overruns and detect model drift early.

How do I implement Responsible AI monitoring in production?

Map each Responsible AI principle to a telemetry rule: log model versions for transparency, track demographic parity metrics for fairness, and maintain audit logs of data access for accountability. Configure Azure Monitor alerts to fire when these guardrails breach defined thresholds, ensuring continuous compliance.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how microsoft/generative-ai-for-beginners works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →