How to Build Production-Ready LLM Applications with Guardrails: A Layered Architecture Guide

Production-ready LLM applications require a defense-in-depth architecture combining input/output guardrails, API gateways for traffic mediation, and observability pipelines to ensure safety, reliability, and compliance before exposing AI systems to real users.

The rohitg00/ai-engineering-from-scratch repository provides a comprehensive blueprint to build production-ready LLM applications with guardrails that separates safety concerns into distinct architectural layers. By implementing orthogonal systems for validation, routing, and monitoring, you can prevent policy violations while maintaining high throughput and observability.

The Four-Layer Production Architecture

Modern LLM services require a defense-in-depth strategy that handles concerns at different levels of the stack. The curriculum defines four critical layers that work together to create a robust production environment.

Layer 1: Guardrails

The guardrails layer serves as your first and last line of defense, validating every request and response before it reaches or leaves the user. According to the OpenAI Agents SDK documentation in phases/14-agent-engineering/16-openai-agents-sdk/docs/en.md, this layer implements three specific primitives:

  • InputGuardrail – Runs on the first user prompt to redact PII and detect policy violations
  • OutputGuardrail – Runs on the final LLM reply to strip leaked sensitive data and enforce content policies
  • ToolGuardrail – Validates arguments for each function tool to prevent unauthorized file access or network calls

These components ensure that the model never sees raw sensitive data and that no policy-breaking content reaches end users.

Layer 2: Gateway & Routing

The gateway layer mediates traffic to provider APIs and adds a second-line safety net. As detailed in phases/17-infrastructure-and-production/19-ai-gateways/docs/en.md, production deployments should use:

  • Portkey – A control-plane proxy that adds PII redaction and jailbreak detection
  • Kong – A high-throughput router designed for enterprise workloads
  • LiteLLM – A lightweight development proxy (not recommended for production-scale deployments)

These gateways enforce rate limits, add redundant safety checks, and provide centralized control over LLM provider access.

Layer 3: Observability & Feedback

Observability pipelines capture traces, metrics, and evaluation results to feed back into guardrail policies. The implementation in phases/19-capstone-projects/11-llm-observability-dashboard/docs/en.md recommends:

  • Langfuse – For prompt management and session replay
  • Opik – For automated prompt optimization and guardrail integration

These platforms ingest OpenTelemetry GenAI spans, run automated evaluations (RAGAS, LLM-as-judge), and trigger alerts when guardrails trip or latency SLOs degrade.

Layer 4: Core Application Logic

The core logic handles domain-specific retrieval, synthesis, and post-processing. The production RAG chatbot documented in phases/19-capstone-projects/08-production-rag-chatbot/docs/en.md demonstrates this layer through document ingestion, hybrid retrieval (dense + BM25), re-ranking, and LLM synthesis with prompt caching.

Implementing Guardrails with OpenAI Agents SDK

The phases/14-agent-engineering/16-openai-agents-sdk/outputs/skill-agents-sdk-scaffold.md file provides the scaffold for implementing the three guardrail types that intercept data at different stages of the agent lifecycle.

Input Validation

InputGuardrail runs before any LLM invocation. It receives the raw user prompt and can transform or reject it based on policy rules. Typical implementations include regex-based PII redaction and prompt injection detection.

Output Filtering

OutputGuardrail executes after the LLM generates a response but before it reaches the user. This component checks for policy violations, ensures citation format compliance, and strips any accidentally leaked sensitive information.

Tool Safety

ToolGuardrail validates the arguments passed to function tools. When an agent attempts to call external APIs, databases, or file systems, this guardrail verifies that the parameters meet security constraints (e.g., checking allow_network flags before HTTP requests).

Production Code Example

Below is a minimal, runnable skeleton from the curriculum that demonstrates the complete architecture:


# app.py – minimal production‑ready Agents SDK app

from openai import OpenAI, Agent, Guardrail, Tool
from langfuse import Langfuse  # observability client

# ----------------------------------------------------------------------

# 1️⃣ Observability – start a Langfuse trace

lf = Langfuse(public_key="LF_PUBLIC", secret_key="LF_SECRET")
trace = lf.trace(name="production-agent-session")

# ----------------------------------------------------------------------

# 2️⃣ Guardrails

class InputGuardrail(Guardrail):
    def run(self, prompt: str) -> str:
        # Example PII redaction – replace emails with <redacted>

        import re
        return re.sub(r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+",
                      "<redacted>", prompt)

class OutputGuardrail(Guardrail):
    def run(self, response: str) -> str:
        # Block policy‑violating content (simple keyword check)

        forbidden = ["weapon", "exploit", "illegal"]
        if any(word in response.lower() for word in forbidden):
            raise ValueError("GuardrailTrip: policy violation")
        return response

class ToolGuardrail(Guardrail):
    def run(self, args: dict) -> dict:
        # Disallow network‑access tools unless explicitly allowed

        if args.get("url") and not args.get("allow_network"):
            raise ValueError("GuardrailTrip: unauthorized network call")
        return args

# ----------------------------------------------------------------------

# 3️⃣ Tools (example: a safe web fetch)

@Tool
def safe_fetch(url: str, allow_network: bool = False) -> str:
    # Guardrail will reject if allow_network is False

    import httpx
    resp = httpx.get(url, timeout=5.0)
    return resp.text[:500]   # truncate for safety

# ----------------------------------------------------------------------

# 4️⃣ Agent definition – triage + handoff

agent = Agent(
    name="triage",
    model="gpt-4o-mini",
    input_guardrail=InputGuardrail(),
    output_guardrail=OutputGuardrail(),
    tool_guardrail=ToolGuardrail(),
    tools=[safe_fetch],
)

@agent.handle
def route_user_request(message: str):
    """Simple routing: if the user mentions billing → hand off, else answer."""
    if "billing" in message.lower():
        raise agent.Handoff("billing")
    return f"Answering your query: {message[:200]}"

# ----------------------------------------------------------------------

# 5️⃣ Run loop (e.g. FastAPI endpoint)

def handle_request(user_prompt: str):
    trace.span(name="user_prompt", input=user_prompt)   # record prompt

    try:
        response = agent.run(user_prompt)
        trace.span(name="llm_response", output=response)  # record LLM output

        return response
    except Exception as e:
        # Guardrail trip → return a safe fallback

        return "Sorry, I can’t help with that request. Let me point you to support."

This implementation ensures that InputGuardrail runs before the LLM sees data, OutputGuardrail provides a second safety net after generation, and ToolGuardrail protects external function calls from abuse.

Key Files and Implementation References

The following files in the rohitg00/ai-engineering-from-scratch repository form the canonical reference for this architecture:

Summary

  • Guardrails operate at three checkpoints: input validation, output filtering, and tool argument verification to create a defense-in-depth safety perimeter
  • API gateways like Portkey and Kong provide traffic management, rate limiting, and secondary PII reduction at the infrastructure layer
  • Observability platforms (Langfuse, Opik) capture OpenTelemetry spans and enable feedback loops for continuous guardrail improvement
  • The OpenAI Agents SDK provides native primitives (InputGuardrail, OutputGuardrail, ToolGuardrail) that integrate directly with agent orchestration logic
  • Production architectures separate concerns across four distinct layers: guardrails, gateway/routing, observability, and core application logic

Frequently Asked Questions

What is the difference between input and output guardrails?

Input guardrails validate and sanitize user prompts before they reach the LLM, preventing PII exposure and prompt injection attacks. Output guardrails examine generated content before it reaches the user, ensuring compliance with content policies and removing accidentally leaked sensitive data. According to the source code in phases/14-agent-engineering/16-openai-agents-sdk/docs/en.md, both implement the same Guardrail interface but execute at different stages of the request lifecycle.

Portkey is recommended for production environments requiring advanced safety features, as it provides a control-plane proxy with built-in PII redaction and jailbreak detection. Kong serves high-throughput enterprise workloads needing robust traffic management. LiteLLM functions well as a lightweight development proxy but lacks the enterprise features and resilience required for production-scale deployments handling real user traffic.

How do tool guardrails prevent security vulnerabilities?

Tool guardrails intercept every function call to validate arguments against security policies before execution. For example, before allowing a web fetch tool to execute, the guardrail checks for required authorization flags (like allow_network) and validates URL patterns. This prevents agents from making unauthorized network calls, accessing restricted file paths, or executing database operations without proper clearance, effectively sandboxing the agent's capabilities.

How should observability integrate with guardrail policies?

Observability platforms like Langfuse and Opik should ingest OpenTelemetry GenAI spans that include guardrail trip events, latency metrics, and evaluation scores. This data feeds back into policy refinement by identifying common failure patterns and measuring guardrail effectiveness. The implementation in phases/19-capstone-projects/11-llm-observability-dashboard/docs/en.md demonstrates how automated evals (RAGAS, LLM-as-judge) can trigger alerts when guardrails detect anomalies, enabling rapid iteration of safety policies without production risk.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →