RAG: The reports of my death are greatly exaggerated

A colleague at work sent me a blog post that trended on Hacker News: The RAG Obituary: Killed by Agents, Buried by Context Windows, by Nicolas Bustamante.

The post postulates the decline of Retrieval-Augmented Generation (RAG) in AI, suggesting that the rise of agent-based techniques, like using Claude Code, is making RAG obsolete. He starts with a great explanation of what RAG was originally meant to solve: How to get LLMs to work with large documents, given their limited context windows (their short term memory).

Consider the numbers: A single SEC 10-K filing contains approximately 51,000 tokens (130+ pages).

With [an LLM with a context window of] 8,192 tokens, you could see less than 16% of a 10-K filing. It’s like reading a financial report through a keyhole!

The most interesting part of the post is the deep dive into RAG itself, how mixing embedding techniques (which I explored here and here) with traditional keyword search yields better results than either technique in isolation.

In the past I’ve found it hard to get usable results in search by converting sentences to vectors and then comparing them with cosine similarity, the cornerstone of RAG. But mixing that with trad-search sounds promising.

I’m on the fence about the demise of RAG. I haven’t used it successfully and I know a lot of people who are struggling with RAG implementations, but I also hear stories about people using it successfully. In the Y-Combinator interview with LegalTech startup Legora it sounds like they are using RAG very successfully.

For a lively discussion about RAG, its demise or usefulness, check out the Hacker News thread: The RAG Obituary: Killed by Agents, Buried by Context Windows.

The jury is out on RAG. I’m going to try it out again, mixed with trad-search, and report back.

RAG is dead. Long live RAG!