TIL - Structured RAG using typeagent-py
Exploring typeagent-py package for Structured RAG
TIL is a new series I’m starting to quickly document things I explore
(Inspired from Simon’s blog. He also has a presence here in substack).
I accidentally followed Guido van Rossum1 on X and bookmarked his structured RAG is better than RAG talk on the PyBay25 workshop. Following are my notes based on that slides.
The talk starts with the pros and cons of both traditional / classic RAG vs structured RAG
Traditional / Classic RAG
KB sources are chunked, converted into embeddings2 and get stored in a vector database
When the user asks a query, this query is also turned into embeddings (usually using the same embedding model / LLM) and then a similarity search is performed between this question embeddings vs doc embeddings. There are multiple search algos, but the most common that I have encountered is Cosine similarity.
Based on the similarity score, top-K chunks are retrieved, converted back to text, which will be used as additional context for the LLM to answer. A typical instruction to LLM would be :
Based on the retrieved context {retrieved text}, answer the user query {query}.
This talk claims that, traditional rag falls short in more contextual queries. For eg queries like who ate that? or Find the most sold product in Q3 and summarize its profit by regions type of queries.
This is because the retrieval is simply based on embeddings, which doesn’t have semantic meanings.
Structured RAG
In structured RAG, the ingestion pipeline is little nuanced. Instead of simply ingest the embeddings of the KB chunks, a LLM is employed to extract knowledge nuggets using NER3 techniques. LLMs are quite good it, apparently.
This knowledge chunks are more richer in semantics and can be stored in a more traditional datastores like relational DBs with further metadata. They argue, this then becomes a DB indexing problem, which can optimized with existing techniques, that are proven. It is now a computer science problem, they say!
In the querying stage, once again NER is employed to convert the user’s query into abstract entities and then use that to query the pre-populated DB to retrieve the context. The next step is still the same as of classic RAG .
Things I’m not sure!
Even though the theory portion makes sense, I couldn’t find the corresponding implementation from the MS’s typeagent-py repo . Based on my little reading, I still find references to classic rag where embeddings are still used. Need to dig further. Pplx is not helping much here
As per the README, it is simply a py port of TypeAgent KnowPro which is a typescript implementation of structured RAG concept by MS. The type agent architecture document provides more insights of this framework.
The distilled info from these docs to me are simply the following :
Structured RAG is being posed as a better framework for conversational usecase, where indirect references are better recalled than classic RAG
It works by elucidating metadata from those conversations / podcast transcripts (which inturn is in conv) such as timestamp, speaker etc and put it in a indexed DB backed data store.
A LLM is then used to simply translate NLP input to structured query input and vice versa.
In a nutshell, instead of using LLM as a embedding translator, they are using it as a NLP to sql translator??
My gut feeling : Probably over engineered but still a good project to explore. Definitely in my watchlist, but not in my toolbelt4
The inventor of python
Either by an embedding model or by an LLM



Couldn't agree more. This makes total sense! Like when I log my weightlifting, just raw rep counts are one thing. But structured data makes understanding progress far more contextual and actionable. You've nailed why traditional RAG falls short for complex queries. Excellent insight!