how-to-guide

How to Debug AI Agents in ai-agents-for-beginners: A Complete Observability Guide

April 22, 2026 microsoft/ai-agents-for-beginners ↗

Debugging AI agents built with ai-agents-for-beginners relies on built-in OpenTelemetry instrumentation, structured logging, and trace visualization tools like Langfuse.

The microsoft/ai-agents-for-beginners repository embeds observability hooks directly into every lesson, giving you a glass-box view of what your agent did, when it happened, and why it produced specific outputs. This guide walks through the architecture, implementation patterns, and hands-on code examples for debugging agents at any scale.

The Observability-First Architecture

The course organizes debugging capabilities across five interconnected layers. Understanding this stack helps you navigate traces efficiently when something breaks.

Layer 1: Agent Core (Microsoft Agent Framework)

The orchestration engine handles LLM calls, tool selection, and decision routing. It's instrumented with OpenTelemetry so every internal operation becomes a traceable span.

Source: 10-ai-agents-production/README.md lines 71-80
Key capability: Automatic span generation for LLM calls and tool invocations

Layer 2: Observability Layer

Captures traces (complete agent runs) and spans (individual steps). This tree structure lets you walk from a failed output backward to the exact decision point where things went wrong.

Source: 10-ai-agents-production/README.md lines 20-26

Layer 3: Logging & Monitoring

Structured logs capture business-level events—agent IDs, action names, timestamps, and outcomes. These complement traces by making agent behavior searchable in standard log aggregators.

Source: 08-multi-agent/README.md lines 73-78

Layer 4: Visualization

Dashboards (Langfuse, Azure AI Foundry, or custom notebooks) render agent interaction graphs. These make coordination failures and missing hand-offs visually obvious.

Source: 08-multi-agent/README.md lines 75-77

Layer 5: Evaluation Metrics

Latency, cost, request-error rate, user feedback, and automated eval scores let you correlate failures with metric spikes—a sudden cost increase often signals an infinite loop.

Source: 10-ai-agents-production/README.md lines 41-58

Hands-On Debugging: Code Examples

These patterns, drawn directly from the repository, give you immediately runnable starting points.

Basic OpenTelemetry Instrumentation

Every agent in the course starts with this pattern from 10-ai-agents-production/README.md (lines 71-80):

from agent_framework.observability import get_tracer, get_meter

tracer = get_tracer()
meter = get_meter()

with tracer.start_as_current_span("agent_run"):
    # All MAF internal calls (LLM, tool use) are automatically traced here

    # Your agent logic goes here

    result = agent.execute(query)

What you get: A top-level trace named agent_run with nested child spans for each LLM call, tool invocation, and internal function.

Adding Custom Metadata to Spans

For business-context debugging, extend spans with identifiers from 10-ai-agents-production/README.md (lines 84-96):

from langfuse import get_client

langfuse = get_client()
span = langfuse.start_span(
    name="flight_search",
    metadata={
        "user_id": user.id,
        "session_id": session.id,
        "search_criteria": str(criteria)
    }
)

# ... run flight-search tool ...

results = flight_api.search(criteria)

span.end(output=results)

Why this matters: The metadata dictionary appears in your trace UI, enabling you to filter spans by user, group by session, or audit specific search patterns.

Structured Logging for Action Tracking

When you need searchable plain-text logs, use this wrapper pattern derived from 08-multi-agent/README.md:

import logging
import json
from datetime import datetime

logger = logging.getLogger("agent_debug")
logger.setLevel(logging.INFO)

def log_action(agent_name: str, action: str, payload: dict = None, outcome: str = None):
    """Emit structured log entry for every agent action."""
    entry = {
        "timestamp": datetime.utcnow().isoformat(),
        "agent": agent_name,
        "action": action,
        "payload": payload or {},
        "outcome": outcome
    }
    logger.info(json.dumps(entry))

# Usage in your agent

def search_flights(agent, criteria):
    log_action(agent.name, "flight_search_start", {"criteria": criteria})
    try:
        results = flight_api.search(criteria)
        log_action(agent.name, "flight_search_complete", {"results_count": len(results)}, "success")
        return results
    except Exception as e:
        log_action(agent.name, "flight_search_failed", {"error": str(e)}, "error")
        raise

Integration: These logs feed directly into Azure Monitor, Elastic, or any aggregator that accepts JSON-structured logs.

Visualizing Agent Interaction with Langfuse

For multi-agent coordination bugs, render interaction graphs using patterns from 10-ai-agents-production/code_samples/10-expense-claim-demo.ipynb:


# In a notebook after running a trace

from langfuse import get_client

client = get_client()
trace = client.get_trace(trace_id="abcd1234-efgh-5678-ijkl-901234mnopqr")

# Render interactive graph of spans

trace.plot()

What the UI reveals: Arrows between spans show hand-off timing, retry loops, and orphaned tool calls that don't return to their parent agent.

Key Debugging Files in the Repository

Path	Purpose	Direct Link
`08-multi-agent/README.md`	Logging, monitoring, and visualization fundamentals for multi-agent systems	https://github.com/microsoft/ai-agents-for-beginners/blob/main/08-multi-agent/README.md
`10-ai-agents-production/README.md`	OpenTelemetry setup, trace/span creation, and evaluation metrics	https://github.com/microsoft/ai-agents-for-beginners/blob/main/10-ai-agents-production/README.md
`08-multi-agent/code_samples/workflows-agent-framework/python/01.python-agent-framework-workflow-ghmodel-basic.ipynb`	Minimal workflow with automatic trace emission	https://github.com/microsoft/ai-agents-for-beginners/blob/main/08-multi-agent/code_samples/workflows-agent-framework/python/01.python-agent-framework-workflow-ghmodel-basic.ipynb
`10-ai-agents-production/code_samples/10-expense-claim-demo.ipynb`	Full production demo: instrumentation, Langfuse visualization, and metric collection	https://github.com/microsoft/ai-agents-for-beginners/blob/main/10-ai-agents-production/code_samples/10-expense-claim-demo.ipynb
`AGENTS.md`	Curriculum overview and Azure AI Foundry integration for production deployment	https://github.com/microsoft/ai-agents-for-beginners/blob/main/AGENTS.md

Summary

Debugging AI agents in ai-agents-for-beginners succeeds through five integrated practices:

Instrument with OpenTelemetry to capture automatic spans for every LLM call and tool invocation
Emit structured logs alongside traces for searchable, business-context records
Add custom metadata to spans for user/session filtering and audit trails
Visualize with Langfuse or Azure dashboards to expose coordination failures and loops
Monitor evaluation metrics (latency, cost, error rate) to correlate failures with performance spikes

The repository's production-ready examples in 10-ai-agents-production and multi-agent patterns in 08-multi-agent provide immediately runnable debugging scaffolding.

Frequently Asked Questions

What is the fastest way to start debugging an agent in ai-agents-for-beginners?

Run the notebook at 08-multi-agent/code_samples/workflows-agent-framework/python/01.python-agent-framework-workflow-ghmodel-basic.ipynb. It emits traces automatically without configuration. Inspect the output in your terminal or connect to Langfuse for visualization.

How do I trace a specific user session across multiple agent calls?

Add user_id and session_id to span metadata using the pattern in 10-ai-agents-production/README.md lines 84-96. Pass these identifiers when starting each span, then filter your trace UI by these fields.

What should I check first when an agent produces incorrect outputs?

Open the trace tree and locate the span where the output diverged from expectations. Check three things: the LLM prompt (input to the span), the raw LLM response (output), and any tool calls made between decision points. The logs will show the agent's reasoning if structured logging is enabled.

Can I use Azure Monitor instead of Langfuse for visualization?

Yes. The OpenTelemetry spans emitted by the Microsoft Agent Framework are backend-agnostic. Configure your TracerProvider to export to Azure Monitor's OpenTelemetry endpoint instead of Langfuse. The 10-ai-agents-production README mentions Azure AI Foundry as the production deployment target.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how microsoft/ai-agents-for-beginners works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →