This is the Part 3 of this exploration series, covering the nuances of guardrails🎢 using OpenAI Agent SDK. Please refer to Part 1, Part 2 below
In this part, I’m going to document what I have learnt about InputGuardRail
and OutputGuardRail
. Simply put, these are essentially hooks placed before & after the main LLM call, that the agent makes to perform its task.
The purpose of these hooks is to have better control over the LLM input and output and needless to say, the usecase varies. Typical examples are :
For input guard rail, you might wanna :
inspect the incoming user input to either filter for harmful or unrelated data, so that the application can either refuse or take mitigation even before the LLM is invoked.
transform or map user input to different data before sending it to the LLM for inference.
For output guard rail, same pattern follows, where you might wanna make sure LLM’s response is unbiased or relevant before sending it back to the user.
Guard Rails ?
Guard rails and Hand offs were touted to be some of the USP1 of OpenAI Agent SDK. Even though the class names sound as though something magical is happening, in reality, these are just function callbacks that gets called before and after the agent’s LLM invocation.
Now what you do within those functions are completely usecase specific. For example, you could have a procedural imperative sequence of steps that performs the input / output parsing or it could also be another LLM call, either direct or calling other tools / agents which will invoke a LLM call within that context to perform the tasks. This is also known as LLM As Judge pattern i.e using LLM to critique another LLM’s response2.
There can be different ways in which these hooks or callbacks can be implemented and each Agent SDK chooses their way. When it comes to OpenAI Agents SDK, the way these callbacks are implemented is kinda bloated, in my opinion. You will see what I mean shortly!
Input Guardrails Example
Let’s start with an example of using InputGuardRail
. As a remainder, Input Guardrails are generally used to catch the input before it is sent to the Agent’s LLM invocation. Lets go over the code line by line as usual
💁🏻♂️ For full end to end runnable code, Please refer openai-local-agents repo.
1️⃣ First let’s start off by importing InputGuardrail, GuardrailFunctionOutput, InputGuardrailTripwireTriggered
classes from agents
package. InputGuardrail
is simply any class that accepts any function that can return a GuardrailFunctionOutput
object. Simply put, during one of those input/output hooks, the Agent
class simply calls back the function wrapped under InputGuardrail
, which is expected to return an object of type GuardrailFunctionOutput
when called.
2️⃣ For this example, I’m also performing a RAG as the main agent’s task. Hence I created a utility class that performs RAG using Ollama Embeddings.
✅ Please follow the number emoji (like 1️⃣ ) on the code snippet for easy reference!
💡 Also, Substack’s code highlighting is pathetic. If you like navigating the code in a more intuitive way, please consider visiting my website blog instead.
Before we proceed further, I would like to paint the usecase Figure 1 we are trying to achieve in this example. The main agent named rag_agent
is tasked to perform RAG on PydanticAI
’s documentation. I simply took a dump of this file and turned it into a knowledge-base located under ./kb/
path, which this rag_agent
will use to perform the RAG on, using a function tool named get_answer_about_pydanticai
.
But, before the input is passed to the rag_agent
, we insert a filter to sample3 the input and do something, anything with it. In this example, we are going to use another agent named input_guardrail_agent
to act as LLM as Judge4 to determine whether the input query is relevant to the PydanticAI topic. If so, the output of guard rail agent will be used to respond to the user. If not, the control flow will be routed back to the rag agent to perform its task.
Now that we understand the intent, lets proceed with out code walkthrough.
3️⃣ Using the imported LocalRAGProvider
class, we initialize an object using nomic-embed-text:latest
as the local embedding model and we load the KB to initialize the RAG structure.
4️⃣ Then, we define a class named InputGuardrailResponse
inheriting from Pydantic’s BaseModel
class, to represent the output response we would receive from the guard rail agent. We basically want to force this guard rail agent to return its response in this class format
⚠️ Note :
This
InputGuardrailResponse
is NOT an SDK class. It is created for this usecase to contain 3 members :
isValidQuestion
: a boolean to let us know whether or not to route the control back to the main rag agent
reasoning
: why did the guard rail agent think whether the input question is valid/or not?
polite_decline_response
: the string response which will be shown to the user, ifisValidQuestion
is set tofalse
Great! so now we proceed to define the main rag agent, along with its tools.
5️⃣ We create the rag_agent
as an instance of Agent
and it has access to a tool named get_answer_about_pydanticai
.
6️⃣ This get_answer_about_pydanticai
is nothing but a function_tool
that simply calls ollama_rag.query
to perform the RAG and return the response.
But, hold on where is the filter on InputGuardRail
? Don’t we need to insert this filter before this rag_agent
gets the input? Where is the association?
I hear you. For sure, we need to make that association in this rag_agent
’s declaration only. But before do that, we need to declare or define what InputGuardRail
should be. Let us do that below.
7️⃣ So first, we create the input_guardrail_agent
class from Agent
, which will simply act as LLM as Judge to validate whether the input query is related to PydanticAI. Nothing fancy! but…
8️⃣ I implore you to look this line where the output_type
is set to InputGuardrailResponse
type. Remember we set this in Step 4. This is the way of forcing the LLM to respond in a structured format. So if we simply inspect the value of isValidQuestion
, we know whether to send a polite refusal or reroute the flow back the rag_agent
.
Perfect 💁🏻♂️! Now that we defined input_guardrail_agent
, it is now time to associate this agent as the InputGuardRail
for the rag_agent
. How do we do it? Simply add it as a property. See below!
9️⃣ We first wrap calling this input_guardrail_agent
agent inside an async
function named pydanticai_input_guardrail
.
🔟 Within this function, we cast the RunResult
of the agent to InputGuardrailResponse
and return an instance of GuardrailFunctionOutput
.
1️⃣1️⃣ The definition of GuardrailFunctionOutput
class contains 2 key properties. output_info
which is the refusal output from the guard rail agent and tripwire_triggered
property which is set to the not final_result.isValidQuestion
.
1️⃣2️⃣ We then finally associate this pydanticai_input_guardrail
as the guardrail_function
hook of the rag_agent
by setting the input_guardrails
property to an InputGuardrail
instance.
Okay! so that’s it right? We made the necessary definitions and hooks and we can just fire the Runner.run()
on the rag_agent
like below right?
async def main():
result = await Runner.run(rag_agent, "How do I install pytorch?")
print(f"Result: {result.final_output}")
Well, not quite!!!
The real deal of tripwire_triggered
Well you see, this tripwire_triggered
is actually to throw an exception, specifically InputGuardrailTripwireTriggered
exception!
What this means is, if the GuardRail triggers this trip wire i.e if the input_guardrail_agent
sets the isValidQuestion
as false
, then this trip wire is triggered and it will throw an InputGuardrailTripwireTriggered
exception.
🚨 Caution :
What is not documented clearly or shown in the OpenAI SDK documentation is that, the usage of
InputGuardRail
orOutputGuardRail
involves exception handling from the coding logic. It doesn’t behave like ahandoff
where, the next agent is marked as the GuardRail agent if the guard rail condition is met.
I’m NOT saying this is a bad practice, but it certainly didn’t meet my expectation that I had5. So in reality, the application developer should handle this exception to either halt the execution or take other mitigation as necessary for the usecase. In this example, we are simply going to use the polite_decline_response
property of the output to send the refusal response to the user.
1️⃣3️⃣ We simply catch the InputGuardrailTripwireTriggered
exception &
1️⃣4️⃣ Use the guardrail_result
as we see fit.
Too much abstraction!
So, we went over an example of using Input Guardrails. The same pattern can be extended to OutputGuardRail
as well. Instead of filter the input to the agent, we will be filtering the output from the agent, that’s the difference.
What could have been a simple callback hooks, is pretty convoluted in my opinion. If we take a step back and zoom out, this is what is happening Figure 2.
The main agent rag_agent
is setup with a InputGuardRail
object which is nothing but wraps a guardrail_function
function to call back during the input hook trigger. This function throws an instance of GuardrailFunctionOutput
.
If this GuardrailFunctionOutput
object’s tripwire_triggered
property is set, InputGuardrailTripwireTriggered
exception is thrown. This exception is then used to decide the next course of action.
👀 A simple callback is implemented as convoluted 7 steps.
This is what I term as bloated in the beginning. And not only that, these guard rails can also be chained together, based on the spec that
input_guardrails
is a list. So multiple input guard rail(s) clubbed with multiple output guard rail(s) will definitely eat up more resources and make it much more difficult to debug and maintain in the long run.It is no wonder, OpenAI advertises its tracing system, along side their agents sdk!
And look at how many objects we need to cross to get to the
polite_decline_response
!!! 🤪🤪😆
Wrap up!
Before I wrap up, I do wanna caution you that much of this guardrails depend on the implementation of guardrail_function
(in our case it was pydanticai_input_guardrail
). If you resort to LLM as Judge implementation, like what we did in our example, the trigger to either pivot or reroute to the main agent, solely depends on the LLM Judge’s performance.
When I tried experimenting this code, 8/10 times the LLM judge model failed to provide proper output in the expected format. Obviously, the OpenAI agents SDK is not the culprit, but the underlying model is. Since I’m using a severely quantized model compared to other frontier models, this is a known limitation. But hey! its free!
So don’t fret too much if you aren’t able to reproduce this experiment in one go. Try multiple times with multiple models and choose the one that is reliable.
This concludes Part 3 of my journey. If you are with me so far, then you will definitely enjoy the final part of this series, where I plan to cover Contexts, MCP and Parallel agents.
See you soon! à bientôt!
Unique Selling Point
It could also be the same model that gets used under the hood, both the judge and the candidate LLM won’t share context
I just wanted to sound scientific, but the entire input along with context and agent is passed to this filter.
I understand LLM as Judge terminology is used when the input to be validated is a LLM output. But I’m using it here in a broader sense wherein I use LLM not to answer the actual user query, but to decide whether or not to answer.
based on the handoff pattern