Part 2 - Exploring OpenAI Agent SDK with Local Models (Ollama)

Exploring multi-agent hand offs

May 12, 2025

person in brown long sleeve shirt holding black book — Photo by Jonathan Sanchez on Unsplash

This is the Part 3 of this exploration series, covering the nuances of guardrails🎢 using OpenAI Agent SDK. Please refer to Part 1, Part 3 below
Part 1 - Exploring OpenAI Agent SDK with Local Models (Ollama)
Viswa Kumar
·
May 8
Read full story
Part 3 - Exploring OpenAI Agent SDK with Local Models (Ollama)
Viswa Kumar
·
May 20
Read full story

Recap

Welcome back! If you don’t get why I’m welcoming you back, you should go and read Part 1 of this series.

So, continuing our journey, now its time to bring in more agents in to the mix and see how the SDK facilitates multi agent coordination.

Multi Agents

Creating more than 1 agent, is pretty simple as creating multiple instances of Agent class. Lets see some examples

1️⃣ As you remember in part-1, we learnt how to create a ModelProvider instance to return a OpenAIChatCompletionsModel made out of Ollama. I simply introduced this piece as a sub module, so that I can use it for all of my further explorations.

2️⃣ We now create our first agent with name Accounts Agent.

3️⃣ We fill in the handoff_description which will be used by some other higher level agent (we will see this later) to decide whether or not to hand off the control this agent.

4️⃣ Then, we provide some system instruction via instructions.

5️⃣ Finally, we set the model to the OllamaProviderAsync().get_model() to force the agent to use our local model

Please note that, even though OpenAI agent SDK does provide the sync version of the run() method i.e Runner.run_sync(), unfortunately it is not available for non-OpenAI models. This is because, OpenAIChatCompletionsModel contains only the AsyncOpenAI client instead of OpenAI client (which is the sync version), as evident here

Great! now lets create few other agents and see how they can communicate to each other.

1️⃣ We create a credit_card_agent with its relevant handoff_description and instructions

2️⃣ We then create a wire_transfer_agent agent with its relevant handoff_description and instructions

3️⃣ For the wire_transfer_agent, we also provide access to get_wire_transfer_status tool,

4️⃣ Which is, nothing but a functional call to return a dummy status, decorated with function_tool decorator. The function name and doc strings of the function is sufficient for the agent to select this tool if warranted.

5️⃣ Finally, we create the main_agent aka routing agent called Operator Agent and set the handoffs to a list of allowed sub agents.

Here is a graphical view of the orchestration we made so far:

Substack’s code highlighting is pathetic. If you like navigating the code in a more intuitive way, please consider visiting my website blog instead.

Then, we simply feed the main_agent with the user’s input to see the routing happening. Atleast that’s what the OpenAI documentation claims 😳 ⁉️⚠️. But if you run the below code…

async def main():
    result: RunResult = await Runner.run(main_agent, "What is the status of my wire transfer with id 1234")
    print(result.final_output)

asyncio.run(main())

you will end up with…

{"type":"function\",
\"name\":\"transfer_to_wire_transfer_agent\",
\"parameters\":{}}

Well, What happened ??

Well, OpenAI’s documentation is not that clear. It took me ~30 mins to figure out that, handoffs doesn’t automatically hand it off to the right agent. It simply sets the probable next agent in the result.last_agent property.

In other words, OpenAI agent SDK, simply sets the next agent to be called, according to the main_agent’s LLM output. i.e main_agent has determined which agent the control should be handed off to. It is not handed off yet. To do that, you need to call the Runner.run() on the next agent. So the modified code would be :

1️⃣ We simply inspect the value of last_agent from the result and if it one of the available hand offs agent list,

2️⃣ We call the run() of that last_agent and pass the user_input again.

3️⃣ Finally, we print the final_output of sub_result.

Here the final complete example with everything we learnt so far

Tool response as final response

Now, consider a case where you don’t require LLM to use the tool response (get_wire_transfer_status in case of wire_transfer_agent) to prepare the final answer, instead just return the tool response as the final answer? Well, agent-SDK has made it very simpler!

1️⃣ Simply set the tool_use_behavior="stop_on_first_tool" on the respective agent.

This would change the output as follows :

Routing back to `main_agent`

Now, what if the query doesn’t pertain to any sub agent and you want the main agent to answer it? Well this is working only incase of thinking models. With qwen3 model for the below :

For an input like Who is the president of USA?, I’m seeing :

But, if I use normal LLMs like llama3.2 etc, I’m seeing :

As you can see, the agent is making up a tool to lookup this fact, even though no such tool is listed in the main_agent1

Wrapping up

Great! So in this post, I covered how to use OpenAI agent-sdk’s handoffs to orchestrate multi-agent collaboration. By and large the SDK is very intuitive and works as advertised with some teething issues that I covered. Given that OpenAI is the pioneer in not only introducing Chat-GPT to masses, but also the first one to standardize API inferencing, which forced almost every other model provider to embrace, adopt OpenAI API spec, I hope they will do everything in their power to make this SDK more robust.

Coming back to my exploration, I still consider my feet only wet, with deep dives in upcoming posts. I would say, stay-tuned, but we are not in analog world anymore, So I’m gonna say, stay subscribed and catch you again on my Part 3 post, very soon!

I have logged this issue in OpenAI’s Github.↩︎

Part 2 - Exploring OpenAI Agent SDK with Local Models (Ollama)

Exploring multi-agent hand offs

Part 1 - Exploring OpenAI Agent SDK with Local Models (Ollama)

Part 3 - Exploring OpenAI Agent SDK with Local Models (Ollama)