How to Debug AI Agents in ai-agents-for-beginners: A Complete Observability Guide
Debugging AI agents built with ai-agents-for-beginners relies on built-in OpenTelemetry instrumentation, structured logging, and trace visualization tools like Langfuse.
The microsoft/ai-agents-for-beginners repository embeds observability hooks directly into every lesson, giving you a glass-box view of what your agent did, when it happened, and why it produced specific outputs. This guide walks through the architecture, implementation patterns, and hands-on code examples for debugging agents at any scale.
The Observability-First Architecture
The course organizes debugging capabilities across five interconnected layers. Understanding this stack helps you navigate traces efficiently when something breaks.
Layer 1: Agent Core (Microsoft Agent Framework)
The orchestration engine handles LLM calls, tool selection, and decision routing. It's instrumented with OpenTelemetry so every internal operation becomes a traceable span.
- Source:
10-ai-agents-production/README.mdlines 71-80 - Key capability: Automatic span generation for LLM calls and tool invocations
Layer 2: Observability Layer
Captures traces (complete agent runs) and spans (individual steps). This tree structure lets you walk from a failed output backward to the exact decision point where things went wrong.
- Source:
10-ai-agents-production/README.mdlines 20-26
Layer 3: Logging & Monitoring
Structured logs capture business-level events—agent IDs, action names, timestamps, and outcomes. These complement traces by making agent behavior searchable in standard log aggregators.
- Source:
08-multi-agent/README.mdlines 73-78
Layer 4: Visualization
Dashboards (Langfuse, Azure AI Foundry, or custom notebooks) render agent interaction graphs. These make coordination failures and missing hand-offs visually obvious.
- Source:
08-multi-agent/README.mdlines 75-77
Layer 5: Evaluation Metrics
Latency, cost, request-error rate, user feedback, and automated eval scores let you correlate failures with metric spikes—a sudden cost increase often signals an infinite loop.
- Source:
10-ai-agents-production/README.mdlines 41-58
Hands-On Debugging: Code Examples
These patterns, drawn directly from the repository, give you immediately runnable starting points.
Basic OpenTelemetry Instrumentation
Every agent in the course starts with this pattern from 10-ai-agents-production/README.md (lines 71-80):
from agent_framework.observability import get_tracer, get_meter
tracer = get_tracer()
meter = get_meter()
with tracer.start_as_current_span("agent_run"):
# All MAF internal calls (LLM, tool use) are automatically traced here
# Your agent logic goes here
result = agent.execute(query)
What you get: A top-level trace named agent_run with nested child spans for each LLM call, tool invocation, and internal function.
Adding Custom Metadata to Spans
For business-context debugging, extend spans with identifiers from 10-ai-agents-production/README.md (lines 84-96):
from langfuse import get_client
langfuse = get_client()
span = langfuse.start_span(
name="flight_search",
metadata={
"user_id": user.id,
"session_id": session.id,
"search_criteria": str(criteria)
}
)
# ... run flight-search tool ...
results = flight_api.search(criteria)
span.end(output=results)
Why this matters: The metadata dictionary appears in your trace UI, enabling you to filter spans by user, group by session, or audit specific search patterns.
Structured Logging for Action Tracking
When you need searchable plain-text logs, use this wrapper pattern derived from 08-multi-agent/README.md:
import logging
import json
from datetime import datetime
logger = logging.getLogger("agent_debug")
logger.setLevel(logging.INFO)
def log_action(agent_name: str, action: str, payload: dict = None, outcome: str = None):
"""Emit structured log entry for every agent action."""
entry = {
"timestamp": datetime.utcnow().isoformat(),
"agent": agent_name,
"action": action,
"payload": payload or {},
"outcome": outcome
}
logger.info(json.dumps(entry))
# Usage in your agent
def search_flights(agent, criteria):
log_action(agent.name, "flight_search_start", {"criteria": criteria})
try:
results = flight_api.search(criteria)
log_action(agent.name, "flight_search_complete", {"results_count": len(results)}, "success")
return results
except Exception as e:
log_action(agent.name, "flight_search_failed", {"error": str(e)}, "error")
raise
Integration: These logs feed directly into Azure Monitor, Elastic, or any aggregator that accepts JSON-structured logs.
Visualizing Agent Interaction with Langfuse
For multi-agent coordination bugs, render interaction graphs using patterns from 10-ai-agents-production/code_samples/10-expense-claim-demo.ipynb:
# In a notebook after running a trace
from langfuse import get_client
client = get_client()
trace = client.get_trace(trace_id="abcd1234-efgh-5678-ijkl-901234mnopqr")
# Render interactive graph of spans
trace.plot()
What the UI reveals: Arrows between spans show hand-off timing, retry loops, and orphaned tool calls that don't return to their parent agent.
Key Debugging Files in the Repository
Summary
Debugging AI agents in ai-agents-for-beginners succeeds through five integrated practices:
- Instrument with OpenTelemetry to capture automatic spans for every LLM call and tool invocation
- Emit structured logs alongside traces for searchable, business-context records
- Add custom metadata to spans for user/session filtering and audit trails
- Visualize with Langfuse or Azure dashboards to expose coordination failures and loops
- Monitor evaluation metrics (latency, cost, error rate) to correlate failures with performance spikes
The repository's production-ready examples in 10-ai-agents-production and multi-agent patterns in 08-multi-agent provide immediately runnable debugging scaffolding.
Frequently Asked Questions
What is the fastest way to start debugging an agent in ai-agents-for-beginners?
Run the notebook at 08-multi-agent/code_samples/workflows-agent-framework/python/01.python-agent-framework-workflow-ghmodel-basic.ipynb. It emits traces automatically without configuration. Inspect the output in your terminal or connect to Langfuse for visualization.
How do I trace a specific user session across multiple agent calls?
Add user_id and session_id to span metadata using the pattern in 10-ai-agents-production/README.md lines 84-96. Pass these identifiers when starting each span, then filter your trace UI by these fields.
What should I check first when an agent produces incorrect outputs?
Open the trace tree and locate the span where the output diverged from expectations. Check three things: the LLM prompt (input to the span), the raw LLM response (output), and any tool calls made between decision points. The logs will show the agent's reasoning if structured logging is enabled.
Can I use Azure Monitor instead of Langfuse for visualization?
Yes. The OpenTelemetry spans emitted by the Microsoft Agent Framework are backend-agnostic. Configure your TracerProvider to export to Azure Monitor's OpenTelemetry endpoint instead of Langfuse. The 10-ai-agents-production README mentions Azure AI Foundry as the production deployment target.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →