# How to Monitor and Achieve Observability for AI Applications

> Achieve AI application observability by instrumenting Generative AI services with OpenTelemetry. Capture traces, metrics, and logs then export to Azure Monitor for alerting and dashboards.

- Repository: [Microsoft/generative-ai-for-beginners](https://github.com/microsoft/generative-ai-for-beginners)
- Tags: how-to-guide
- Published: 2026-02-26

---

**To monitor and achieve observability for AI applications, instrument your Generative AI services with OpenTelemetry to capture distributed traces, metrics, and logs, then export telemetry to Azure Monitor or Application Insights to enable alerting, dashboards, and continuous improvement loops.**

The `microsoft/generative-ai-for-beginners` curriculum treats observability as a first-class requirement for production Generative AI (GenAI) systems, not an afterthought. This guide distills the repository’s implementation patterns to help you build reliable, responsible, and cost-effective AI applications using concrete code examples and architectural blueprints from the source lessons.

## Why Observability Is Essential for GenAI

The **Generative AI Application Lifecycle** explicitly mandates that you must "**monitor, evaluate, and improve** it continuously" to maintain production quality ([source](14-the-generative-ai-application-lifecycle/README.md#L5-L7)). During the *Operationalizing* phase, the curriculum instructs developers to "add **Monitoring and Alerts Systems** to our system" before considering deployment complete ([source](14-the-generative-ai-application-lifecycle/README.md#L53-L55)).

Furthermore, the *Building Chat Applications* lesson lists monitoring as a critical requirement "to ensure the applications are operating at the highest level of quality" and ties it directly to **Retraining Cycles**—a key metric that triggers model updates when drift or degradation is detected ([source](07-building-chat-applications/README.md#L13-L15), [L58-L61](07-building-chat-applications/README.md#L58-L61)). Without these signals, you cannot close the feedback loop between production behavior and model improvement.

## Core Observability Pillars

The repository structures observability around four telemetry types. The following table maps each pillar to GenAI-specific implementations and Azure tooling referenced in the curriculum:

| Pillar | What to Track | Recommended Tools |
|--------|---------------|-------------------|
| **Metrics** | Request latency, token count (input/output), throughput, error rates, model-specific scores (perplexity, F1) | Azure Monitor Metrics, Prometheus, OpenTelemetry Metrics |
| **Logs** | API request/response payloads (redacted/sanitized), authentication events, retry attempts, custom business events | Azure Log Analytics, Python `logging`, OpenTelemetry Logs |
| **Traces** | End-to-end request flow across services (Client → API Gateway → Orchestrator → LLM inference) | Azure Application Insights, OpenTelemetry Tracing |
| **Alerts** | SLA breaches, cost-per-token spikes, abnormal error patterns, drift in response quality | Azure Monitor Alerts, Grafana, Webhook triggers |

## Architectural Blueprint

A typical GenAI application flow instruments every tier to emit telemetry to a centralized observability backend:

```mermaid
flowchart LR
    A[Client UI] --> B[API Gateway]
    B --> C[PromptFlow / Orchestrator]
    C --> D[LLM Inference<br/>Azure OpenAI / GitHub Models]
    D --> C
    C --> B
    B --> A

    subgraph Observability Layer
        E[Metrics Collector<br/>OpenTelemetry SDK]
        F[Log Exporter<br/>Azure Log Analytics]
        G[Trace Exporter<br/>Application Insights]
        H[Alert Engine<br/>Azure Monitor]
    end

    D --> E
    D --> F
    D --> G
    E --> H
    F --> H
    G --> H

```

1. **Instrumentation** – Embed OpenTelemetry SDKs in your orchestration layer (PromptFlow, Python FastAPI, or TypeScript services).
2. **Export** – Route telemetry to Azure Monitor (metrics), Log Analytics (structured logs), and Application Insights (distributed traces).
3. **Alerting** – Define thresholds on latency, error-rate, and cost-per-token to trigger notifications or CI/CD retraining pipelines.

## Implementing Observability in Code

The repository provides language-specific patterns for instrumenting OpenAI clients and orchestration frameworks.

### Python: OpenTelemetry with Azure Monitor

Create an [`instrumentation.py`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/instrumentation.py) module to configure the OpenTelemetry SDK and auto-instrument the OpenAI client:

```python

# instrumentation.py

import os
import logging
from opentelemetry import trace, metrics
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.instrumentation.openai import OpenAIInstrumentor
from opentelemetry.exporter.azuremonitor import (
    AzureMonitorMetricExporter, 
    AzureMonitorTraceExporter
)

# Configure resource attributes (service name, version, environment)

resource = Resource.create({
    "service.name": "genai-chat-app",
    "service.version": "1.0.0",
    "deployment.environment": os.getenv("ENV", "dev"),
})

# Initialize Tracing

trace.set_tracer_provider(TracerProvider(resource=resource))
tracer = trace.get_tracer(__name__)
trace_exporter = AzureMonitorTraceExporter(
    connection_string=os.getenv("AZURE_MONITOR_CONNECTION_STRING")
)
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(trace_exporter)
)

# Initialize Metrics

metric_reader = PeriodicExportingMetricReader(
    AzureMonitorMetricExporter(
        connection_string=os.getenv("AZURE_MONITOR_CONNECTION_STRING")
    )
)
metrics.set_meter_provider(MeterProvider(
    resource=resource, 
    metric_readers=[metric_reader]
))
meter = metrics.get_meter(__name__)

# Custom metric: tokens per request

tokens_counter = meter.create_counter(
    name="genai.tokens_sent",
    description="Number of tokens sent to the LLM per request",
    unit="tokens",
)

# Structured logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("genai")

# Auto-instrument OpenAI

OpenAIInstrumentor().instrument()

```

Then instrument your business logic in [`chat_service.py`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/chat_service.py):

```python

# chat_service.py

import os
from openai import OpenAI
from instrumentation import tokens_counter, logger

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def chat(prompt: str, user_id: str):
    logger.info("Chat request", extra={
        "user_id": user_id, 
        "prompt_len": len(prompt)
    })
    
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
    )
    
    # Record token usage

    usage = response.usage
    tokens_counter.add(
        usage.total_tokens, 
        {"model": "gpt-4o-mini", "user_id": user_id}
    )
    
    logger.info("Response generated", extra={
        "user_id": user_id, 
        "tokens": usage.total_tokens
    })
    return response.choices[0].message.content

```

Running this service streams **traces**, **metrics**, and **logs** to Azure Monitor, enabling dashboard creation and threshold alerting on token costs and latency.

### TypeScript: Azure Application Insights

For Node.js services, use the Application Insights SDK to capture distributed traces and custom telemetry:

```typescript
// obs.ts
import { ApplicationInsights } from "@microsoft/applicationinsights-web";
import { OpenAI } from "openai";

const appInsights = new ApplicationInsights({
  config: {
    connectionString: process.env.AZURE_APP_INSIGHTS_CONNECTION_STRING!,
    enableAutoRouteTracking: true,
  },
});
appInsights.loadAppInsights();

export async function chat(prompt: string, userId: string) {
  const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! });

  const start = Date.now();
  const response = await client.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [{ role: "user", content: prompt }],
  });
  const duration = Date.now() - start;

  // Custom telemetry
  appInsights.trackEvent({
    name: "ChatRequest",
    properties: { userId, promptLength: prompt.length },
  });
  appInsights.trackMetric({
    name: "ChatLatencyMs",
    value: duration,
  });
  appInsights.trackMetric({
    name: "TokensUsed",
    value: response.usage?.total_tokens ?? 0,
  });

  return response.choices[0].message.content;
}

```

Application Insights automatically captures HTTP dependencies and distributed traces, while the custom events feed the **Metrics** and **Alerts** panes of Azure Monitor.

### PromptFlow: Built-in Telemetry

When using **PromptFlow** (Azure AI Studio), apply the `@monitor` decorator to auto-emit run-level metrics without manual instrumentation:

```python
from promptflow import tool, monitor

@tool
@monitor
def generate_answer(context: str, question: str) -> str:
    # PromptFlow automatically logs:

    # - Node latency

    # - Token usage (input/output)

    # - Exception stack traces

    return llm_client.chat(context, question)

```

Executing this flow in Azure AI Studio exposes **Run Metrics** and allows you to set alerts on cost-per-run or node-level failures.

## Responsible AI Observability

The repository links observability directly to the **six principles of Responsible AI** (fairness, reliability, privacy, inclusiveness, transparency, accountability) by mapping each principle to concrete telemetry checks ([source](07-building-chat-applications/README.md#L65-L74)):

| Principle | Observable Guardrail |
|-----------|---------------------|
| **Fairness** | Log demographic parity metrics per request batch; alert on disparate impact scores |
| **Reliability & Safety** | Error-rate thresholds and anomaly detection on response latency/outliers |
| **Privacy & Security** | Audit logs of data-access events; automatic PII redaction in request logs |
| **Transparency** | Log model version ID and prompt template hash with every response |
| **Accountability** | End-to-end trace IDs linking user feedback (thumbs up/down) to specific model versions for retraining attribution |

## Summary

- **Observability is a lifecycle requirement** defined in [`14-the-generative-ai-application-lifecycle/README.md`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/14-the-generative-ai-application-lifecycle/README.md), not an optional add-on.
- **Instrument with OpenTelemetry** to capture the three pillars—metrics, logs, and traces—from Python, TypeScript, or PromptFlow runtimes.
- **Export to Azure Monitor** to centralize telemetry and configure alerts on latency, error rates, and cost-per-token.
- **Close the feedback loop** by connecting alerts to retraining pipelines, fulfilling the *Retraining Cycles* metric specified in [`07-building-chat-applications/README.md`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/07-building-chat-applications/README.md).
- **Validate Responsible AI** principles through specific telemetry checks for fairness, privacy, and transparency.

## Frequently Asked Questions

### What are the three pillars of observability for AI applications?

The three pillars are **metrics** (quantitative performance indicators like latency and token count), **logs** (structured event records including API requests and errors), and **traces** (distributed request flows across microservices). The `microsoft/generative-ai-for-beginners` curriculum emphasizes tracking these via OpenTelemetry to maintain production quality.

### How does OpenTelemetry help monitor Generative AI specifically?

OpenTelemetry provides vendor-neutral instrumentation that auto-captures LLM-specific signals such as token usage, model version identifiers, and prompt-response pairs. As shown in the repository’s Python examples, the `OpenAIInstrumentor` automatically wraps OpenAI client calls to emit spans and metrics without manual boilerplate.

### Which metrics should I alert on for cost control?

Alert on **cost-per-token** aggregates, **request latency** percentiles (p95/p99), and **error-rate** spikes. The curriculum specifically highlights monitoring *Retraining Cycles* and *Token Count* to prevent budget overruns and detect model drift early.

### How do I implement Responsible AI monitoring in production?

Map each Responsible AI principle to a telemetry rule: log **model versions** for transparency, track **demographic parity** metrics for fairness, and maintain **audit logs** of data access for accountability. Configure Azure Monitor alerts to fire when these guardrails breach defined thresholds, ensuring continuous compliance.