Part 3 - Exploring OpenAI Agent SDK with Local Models (Ollama)

Exploring GuardRails!

May 20, 2025

This is the Part 3 of this exploration series, covering the nuances of guardrails🎢 using OpenAI Agent SDK. Please refer to Part 1, Part 2 below
Part 1 - Exploring OpenAI Agent SDK with Local Models (Ollama)
Viswa Kumar
·
May 8
Read full story
Part 2 - Exploring OpenAI Agent SDK with Local Models (Ollama)
Viswa Kumar
·
May 12
Read full story

cement highway leading to mountain ranges — Photo by Hogarth de la Plante on Unsplash

In this part, I’m going to document what I have learnt about InputGuardRail and OutputGuardRail. Simply put, these are essentially hooks placed before & after the main LLM call, that the agent makes to perform its task.

The purpose of these hooks is to have better control over the LLM input and output and needless to say, the usecase varies. Typical examples are :

For input guard rail, you might wanna :
1. inspect the incoming user input to either filter for harmful or unrelated data, so that the application can either refuse or take mitigation even before the LLM is invoked.
2. transform or map user input to different data before sending it to the LLM for inference.
For output guard rail, same pattern follows, where you might wanna make sure LLM’s response is unbiased or relevant before sending it back to the user.

Guard Rails ?

Guard rails and Hand offs were touted to be some of the USP1 of OpenAI Agent SDK. Even though the class names sound as though something magical is happening, in reality, these are just function callbacks that gets called before and after the agent’s LLM invocation.

Now what you do within those functions are completely usecase specific. For example, you could have a procedural imperative sequence of steps that performs the input / output parsing or it could also be another LLM call, either direct or calling other tools / agents which will invoke a LLM call within that context to perform the tasks. This is also known as LLM As Judge pattern i.e using LLM to critique another LLM’s response2.

There can be different ways in which these hooks or callbacks can be implemented and each Agent SDK chooses their way. When it comes to OpenAI Agents SDK, the way these callbacks are implemented is kinda bloated, in my opinion. You will see what I mean shortly!

Input Guardrails Example

Let’s start with an example of using InputGuardRail. As a remainder, Input Guardrails are generally used to catch the input before it is sent to the Agent’s LLM invocation. Lets go over the code line by line as usual

💁🏻‍♂️ For full end to end runnable code, Please refer openai-local-agents repo.

1️⃣ First let’s start off by importing InputGuardrail, GuardrailFunctionOutput, InputGuardrailTripwireTriggered classes from agents package. InputGuardrail is simply any class that accepts any function that can return a GuardrailFunctionOutput object. Simply put, during one of those input/output hooks, the Agent class simply calls back the function wrapped under InputGuardrail, which is expected to return an object of type GuardrailFunctionOutput when called.

2️⃣ For this example, I’m also performing a RAG as the main agent’s task. Hence I created a utility class that performs RAG using Ollama Embeddings.

✅ Please follow the number emoji (like 1️⃣ ) on the code snippet for easy reference!
💡 Also, Substack’s code highlighting is pathetic. If you like navigating the code in a more intuitive way, please consider visiting my website blog instead.

Before we proceed further, I would like to paint the usecase Figure 1 we are trying to achieve in this example. The main agent named rag_agent is tasked to perform RAG on PydanticAI’s documentation. I simply took a dump of this file and turned it into a knowledge-base located under ./kb/ path, which this rag_agent will use to perform the RAG on, using a function tool named get_answer_about_pydanticai.

But, before the input is passed to the rag_agent, we insert a filter to sample3 the input and do something, anything with it. In this example, we are going to use another agent named input_guardrail_agent to act as LLM as Judge4 to determine whether the input query is relevant to the PydanticAI topic. If so, the output of guard rail agent will be used to respond to the user. If not, the control flow will be routed back to the rag agent to perform its task.

Figure 1: Sampling input using `InputGuardRail`

Now that we understand the intent, lets proceed with out code walkthrough.

3️⃣ Using the imported LocalRAGProvider class, we initialize an object using nomic-embed-text:latest as the local embedding model and we load the KB to initialize the RAG structure.

4️⃣ Then, we define a class named InputGuardrailResponse inheriting from Pydantic’s BaseModel class, to represent the output response we would receive from the guard rail agent. We basically want to force this guard rail agent to return its response in this class format

⚠️ Note :
This InputGuardrailResponse is NOT an SDK class. It is created for this usecase to contain 3 members :
isValidQuestion: a boolean to let us know whether or not to route the control back to the main rag agent
reasoning : why did the guard rail agent think whether the input question is valid/or not?
polite_decline_response : the string response which will be shown to the user, if isValidQuestion is set to false

Great! so now we proceed to define the main rag agent, along with its tools.

5️⃣ We create the rag_agent as an instance of Agent and it has access to a tool named get_answer_about_pydanticai.

6️⃣ This get_answer_about_pydanticai is nothing but a function_tool that simply calls ollama_rag.query to perform the RAG and return the response.

But, hold on where is the filter on InputGuardRail ? Don’t we need to insert this filter before this rag_agent gets the input? Where is the association?

I hear you. For sure, we need to make that association in this rag_agent’s declaration only. But before do that, we need to declare or define what InputGuardRail should be. Let us do that below.

7️⃣ So first, we create the input_guardrail_agent class from Agent, which will simply act as LLM as Judge to validate whether the input query is related to PydanticAI. Nothing fancy! but…

8️⃣ I implore you to look this line where the output_type is set to InputGuardrailResponse type. Remember we set this in Step 4. This is the way of forcing the LLM to respond in a structured format. So if we simply inspect the value of isValidQuestion, we know whether to send a polite refusal or reroute the flow back the rag_agent.

Perfect 💁🏻‍♂️! Now that we defined input_guardrail_agent, it is now time to associate this agent as the InputGuardRail for the rag_agent. How do we do it? Simply add it as a property. See below!

9️⃣ We first wrap calling this input_guardrail_agent agent inside an async function named pydanticai_input_guardrail.

🔟 Within this function, we cast the RunResult of the agent to InputGuardrailResponse and return an instance of GuardrailFunctionOutput.

1️⃣1️⃣ The definition of GuardrailFunctionOutput class contains 2 key properties. output_info which is the refusal output from the guard rail agent and tripwire_triggered property which is set to the not final_result.isValidQuestion.

1️⃣2️⃣ We then finally associate this pydanticai_input_guardrail as the guardrail_function hook of the rag_agent by setting the input_guardrails property to an InputGuardrail instance.

Okay! so that’s it right? We made the necessary definitions and hooks and we can just fire the Runner.run() on the rag_agent like below right?

async def main():
    result = await Runner.run(rag_agent, "How do I install pytorch?")
    print(f"Result: {result.final_output}")

Well, not quite!!!

The real deal of `tripwire_triggered`

Well you see, this tripwire_triggered is actually to throw an exception, specifically InputGuardrailTripwireTriggered exception!

What this means is, if the GuardRail triggers this trip wire i.e if the input_guardrail_agent sets the isValidQuestion as false, then this trip wire is triggered and it will throw an InputGuardrailTripwireTriggered exception.

🚨 Caution :
What is not documented clearly or shown in the OpenAI SDK documentation is that, the usage of InputGuardRail or OutputGuardRail involves exception handling from the coding logic. It doesn’t behave like a handoff where, the next agent is marked as the GuardRail agent if the guard rail condition is met.

I’m NOT saying this is a bad practice, but it certainly didn’t meet my expectation that I had5. So in reality, the application developer should handle this exception to either halt the execution or take other mitigation as necessary for the usecase. In this example, we are simply going to use the polite_decline_response property of the output to send the refusal response to the user.

1️⃣3️⃣ We simply catch the InputGuardrailTripwireTriggered exception &

1️⃣4️⃣ Use the guardrail_result as we see fit.

Too much abstraction!

So, we went over an example of using Input Guardrails. The same pattern can be extended to OutputGuardRail as well. Instead of filter the input to the agent, we will be filtering the output from the agent, that’s the difference.

What could have been a simple callback hooks, is pretty convoluted in my opinion. If we take a step back and zoom out, this is what is happening Figure 2.

The main agent rag_agent is setup with a InputGuardRail object which is nothing but wraps a guardrail_function function to call back during the input hook trigger. This function throws an instance of GuardrailFunctionOutput.

If this GuardrailFunctionOutput object’s tripwire_triggered property is set, InputGuardrailTripwireTriggered exception is thrown. This exception is then used to decide the next course of action.

Figure 2: Visualizing control flow for GuardRails

👀 A simple callback is implemented as convoluted 7 steps.
This is what I term as bloated in the beginning. And not only that, these guard rails can also be chained together, based on the spec that input_guardrails is a list. So multiple input guard rail(s) clubbed with multiple output guard rail(s) will definitely eat up more resources and make it much more difficult to debug and maintain in the long run.
It is no wonder, OpenAI advertises its tracing system, along side their agents sdk!
And look at how many objects we need to cross to get to the polite_decline_response !!! 🤪🤪😆

Wrap up!

Before I wrap up, I do wanna caution you that much of this guardrails depend on the implementation of guardrail_function (in our case it was pydanticai_input_guardrail). If you resort to LLM as Judge implementation, like what we did in our example, the trigger to either pivot or reroute to the main agent, solely depends on the LLM Judge’s performance.

When I tried experimenting this code, 8/10 times the LLM judge model failed to provide proper output in the expected format. Obviously, the OpenAI agents SDK is not the culprit, but the underlying model is. Since I’m using a severely quantized model compared to other frontier models, this is a known limitation. But hey! its free!

So don’t fret too much if you aren’t able to reproduce this experiment in one go. Try multiple times with multiple models and choose the one that is reliable.

This concludes Part 3 of my journey. If you are with me so far, then you will definitely enjoy the final part of this series, where I plan to cover Contexts, MCP and Parallel agents.

See you soon! à bientôt!

Unique Selling Point

It could also be the same model that gets used under the hood, both the judge and the candidate LLM won’t share context

I just wanted to sound scientific, but the entire input along with context and agent is passed to this filter.

I understand LLM as Judge terminology is used when the input to be validated is a LLM output. But I’m using it here in a broader sense wherein I use LLM not to answer the actual user query, but to decide whether or not to answer.

based on the handoff pattern

Part 3 - Exploring OpenAI Agent SDK with Local Models (Ollama)

Exploring GuardRails!

Part 1 - Exploring OpenAI Agent SDK with Local Models (Ollama)

Part 2 - Exploring OpenAI Agent SDK with Local Models (Ollama)

Guard Rails ?

Input Guardrails Example

The real deal of tripwire_triggered

Too much abstraction!

Wrap up!

Discussion about this post

The real deal of `tripwire_triggered`