How to Implement Input and Output Guardrails for Content Filtering and Validation in openai-agents-python
The openai-agents-python SDK provides InputGuardrail and OutputGuardrail classes for validating user inputs and model outputs, plus specialized ToolInputGuardrail and ToolOutputGuardrail classes for function tool validation, all configurable via decorators that return GuardrailFunctionOutput or ToolGuardrailFunctionOutput objects.
The openai-agents-python framework includes a comprehensive guardrail system for content filtering and validation that intercepts data at three critical points: before the agent processes user input, after the model generates output, and immediately before or after function tool execution. According to the source code in src/agents/guardrail.py, guardrails are Python callables that receive contextual data and return output objects containing a tripwire_triggered boolean that determines whether execution should abort or continue.
Understanding the Guardrail Architecture
The guardrail system distinguishes between four distinct guardrail types, each serving a specific validation purpose in the agent lifecycle:
InputGuardrail– Validates raw user input (strings orTResponseInputItemlists) before the agent processes the request. Defined insrc/agents/guardrail.py, it receives theRunContextWrapper,Agentinstance, and user input.OutputGuardrail– Validates the final output object before it returns to the caller, running after the model completes generation but before the result leaves the agent boundary.ToolInputGuardrail– Executes immediately before a function tool is invoked insrc/agents/run_internal/tool_execution.py, validating pre-parsed arguments.ToolOutputGuardrail– Executes after a function tool returns, allowing post-processing or redaction of sensitive data before the result reaches the model.
All guardrail types support both synchronous and asynchronous implementations. When multiple guardrails are configured, the runner in src/agents/run_internal/guardrails.py executes sequential guardrails first, then launches parallel guardrails concurrently.
Implementing Input Guardrails for Content Filtering
Input guardrails act as the first line of defense against malicious or inappropriate user content. The following synchronous implementation demonstrates a profanity filter using the @input_guardrail decorator factory:
from agents import input_guardrail, GuardrailFunctionOutput, InputGuardrailTripwireTriggered
PROFANITY = {"badword", "offensive"}
@input_guardrail
def profanity_filter(
context, # RunContextWrapper[TContext]
agent, # Agent[Any]
user_input: str | list,
) -> GuardrailFunctionOutput:
"""Reject any input containing a profane token."""
text = user_input if isinstance(user_input, str) else " ".join(str(i) for i in user_input)
if any(word in text.lower().split() for word in PROFANITY):
return GuardrailFunctionOutput(
output_info={"offending_word": "badword"},
tripwire_triggered=True,
)
return GuardrailFunctionOutput(output_info=None, tripwire_triggered=False)
Attach the guardrail to an agent via the input_guardrails parameter in src/agents/agent.py:
from agents import Agent
my_agent = Agent(
name="chatbot",
model=my_model,
input_guardrails=[profanity_filter],
)
When tripwire_triggered=True, the runner raises InputGuardrailTripwireTriggered immediately and aborts the run. The trace span created in src/agents/tracing.py records triggered=True for observability.
Implementing Output Guardrails for Validation
Output guardrails validate model-generated content before it reaches the user. The following asynchronous example enforces a token limit using the @output_guardrail decorator:
from agents import output_guardrail, GuardrailFunctionOutput
import asyncio
MAX_TOKENS = 300
@output_guardrail
async def length_limiter(
context,
agent,
output,
) -> GuardrailFunctionOutput:
"""Abort if the generated text exceeds a token budget."""
token_count = len(output.split())
if token_count > MAX_TOKENS:
return GuardrailFunctionOutput(
output_info={"token_count": token_count},
tripwire_triggered=True,
)
return GuardrailFunctionOutput(output_info=None, tripwire_triggered=False)
Register this guardrail on the agent:
my_agent = Agent(
name="summarizer",
model=my_model,
output_guardrails=[length_limiter],
)
The run_output_guardrails function in src/agents/run_internal/guardrails.py awaits completion of all output guardrails. If any return tripwire_triggered=True, the runner raises OutputGuardrailTripwireTriggered and halts execution.
Tool-Level Guardrails for Pre and Post Execution Validation
Tool guardrails provide granular control over function tool execution, with distinct behaviors defined in src/agents/tool_guardrails.py: allow, reject_content (replace output with a message), or raise_exception (abort the run).
Tool Input Guardrails
The following example validates JSON arguments before tool execution:
from agents import tool_input_guardrail, ToolGuardrailFunctionOutput
import json
@tool_input_guardrail
def validate_json(data):
"""Reject malformed JSON payloads for a function tool."""
try:
json.loads(data.context.tool_args)
except Exception as exc:
return ToolGuardrailFunctionOutput.reject_content(
message="The tool arguments must be valid JSON.",
output_info={"error": str(exc)},
)
return ToolGuardrailFunctionOutput.allow()
Attach to a FunctionTool:
from agents import FunctionTool
my_tool = FunctionTool(
name="search",
description="Search a knowledge base.",
parameters={...},
func=my_search_impl,
input_guardrails=[validate_json],
)
Tool Output Guardrails
Post-execution guardrails sanitize sensitive data before it reaches the model:
from agents import tool_output_guardrail, ToolGuardrailFunctionOutput
SENSITIVE_KEYS = {"ssn", "credit_card"}
@tool_output_guardrail
def scrub_sensitive(data):
"""Remove sensitive fields from a tool's JSON output."""
result = json.loads(data.output)
for key in SENSITIVE_KEYS:
result.pop(key, None)
sanitized = json.dumps(result)
return ToolGuardrailFunctionOutput.allow(output_info={"sanitized": True})
The _execute_tool_output_guardrails function in src/agents/run_internal/tool_execution.py processes these guardrails. Unlike agent-level guardrails, tool guardrails can modify content via reject_content without necessarily aborting the entire run.
Execution Flow and Error Handling
The guardrail execution pipeline follows a strict orchestration pattern defined in src/agents/run_internal/guardrails.py:
- Collection – The runner gathers guardrails from the agent definition and any runtime configuration.
- Sequential Execution – Guardrails marked as sequential run first, in order.
- Parallel Execution – Remaining guardrails execute concurrently via
asyncio.gather. - Tripwire Evaluation – If any guardrail returns
tripwire_triggered=True, the system immediately raises the appropriate exception (InputGuardrailTripwireTriggered,OutputGuardrailTripwireTriggered, orToolGuardrailTripwireTriggered). - Tracing – Each guardrail execution creates a span in
src/agents/tracing.pyannotated withtriggered=Truewhen violations occur.
For streaming scenarios, run_input_guardrails_with_queue manages guardrail execution against queued input items.
Summary
- Input and output guardrails in openai-agents-python validate content at the agent boundary using
InputGuardrailandOutputGuardrailclasses fromsrc/agents/guardrail.py. - Tool guardrails provide pre- and post-execution validation via
ToolInputGuardrailandToolOutputGuardrailinsrc/agents/tool_guardrails.py, supporting content modification without full abortion. - Tripwire mechanism – Returning
GuardrailFunctionOutput(tripwire_triggered=True)aborts execution and raises specific exceptions captured in traces. - Async support – All guardrail types support both sync and async implementations, with parallel execution handled by
src/agents/run_internal/guardrails.py. - Registration – Attach guardrails via decorator factories (
@input_guardrail,@output_guardrail) and list parameters onAgentorFunctionToolinstances.
Frequently Asked Questions
What is the difference between InputGuardrail and ToolInputGuardrail?
InputGuardrail validates the raw user input before the agent begins processing, operating at the conversation level in src/agents/run.py. ToolInputGuardrail executes immediately before a specific function tool is called, validating pre-parsed arguments in src/agents/run_internal/tool_execution.py. Tool guardrails return ToolGuardrailFunctionOutput which supports content rejection (replacing output with a message) in addition to the binary allow/block behavior of standard guardrails.
How do I handle asynchronous guardrail functions?
Decorate async functions with the same decorators (@input_guardrail, @output_guardrail, @tool_input_guardrail, @tool_output_guardrail). The runner in src/agents/run_internal/guardrails.py automatically detects coroutines and awaits them using asyncio.gather for parallel guardrails. Both sync and async implementations return the same GuardrailFunctionOutput or ToolGuardrailFunctionOutput objects.
What happens when a guardrail triggers a tripwire?
When any guardrail returns tripwire_triggered=True in its output object, the runner immediately raises InputGuardrailTripwireTriggered or OutputGuardrailTripwireTriggered (or ToolGuardrailTripwireTriggered for tool guardrails). The agent stops execution, and the trace system records a guardrail span with triggered=True for observability, as implemented in src/agents/tracing.py.
Can guardrails modify content instead of blocking it?
Standard input and output guardrails operate on a binary allow/block model and cannot modify content. However, tool guardrails support content modification via ToolGuardrailFunctionOutput.reject_content(), which replaces the tool output with a custom message that gets sent back to the model instead of the original result. This allows the conversation to continue with corrected or sanitized information without aborting the entire agent run.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →