When people first get into RAG, their attention often goes straight to the vector database. That is understandable. It is usually the part that sounds the most new, the most technical, and the most likely to impress someone in a meeting.

But once you actually build a system, one thing becomes hard to miss:

the vector database is often only one leg of the machine. RAG is the whole machine.

This article is meant to lay out that map properly. What problem RAG is really solving, where the boundary sits between RAG and a vector database, what a typical pipeline actually contains, and why so many apparent “model issues” turn out to be evidence, context, or evaluation problems in disguise.

What problem RAG is really solving

If you think of a large language model as a colleague who writes quickly, summarises well, and speaks with great confidence, its main weakness is usually not that it cannot talk. It is that it often does not know what it does not know.

In real work, that tends to show up in ways like these:

  • the documents have changed, but the model is still leaning on stale knowledge
  • the answer you need lives in internal SOPs, tickets, contracts, or notes rather than on the open web
  • you need something that can justify itself, but the model is perfectly happy sounding certain without much basis

That is where RAG earns its keep. Not by making the model more eloquent, but by giving it retrievable evidence before it starts generating.

The core claim

If I had to start with one line, it would be this:

RAG is not a vector database. A vector database is usually one retrieval component inside a broader system that includes retrieval, context building, constrained generation, and evaluation.

That distinction matters. Because if you begin by treating RAG as “split the text, embed it, store it in a vector database, then ask the model a question”, you tend to walk straight into a whole category of avoidable failures.

A stricter definition of RAG

The short version is:

RAG = retrieve first, generate second.

But if you are trying to build something reliable, I think the more useful definition is:

a RAG system = retrieval + context building + grounded generation + evaluation / observability

If any one of those is weak, the system may still appear to function. It just won’t be especially trustworthy.

  • retrieval decides which evidence comes back
  • context building decides what actually goes into the prompt and in what shape
  • grounded generation keeps the model tied to evidence rather than letting it improvise too freely
  • evaluation / observability tells you whether the system is improving and where it fails

Why RAG is not the same thing as a vector database

A vector database solves a fairly specific problem:

how do I store vectors and search for nearby vectors efficiently?

RAG solves a broader one:

how do I make a model answer from evidence rather than from vague parametric memory and fluent guesswork?

Those two things overlap, but they are not the same.

You can absolutely use a vector database inside a RAG system.
But you can also involve:

  • BM25 or inverted indexes
  • hybrid retrieval
  • structured queries such as SQL or graph lookups
  • reranking stages after recall

So the cleaner mental model is:

  • RAG is a method and a system
  • a vector database is one possible implementation choice inside the retrieval layer

What a typical RAG pipeline looks like

A practical RAG pipeline normally has two broad halves:

1. Offline flow: ingestion and indexing

  • collect documents
  • clean and normalise them
  • chunk them
  • embed them
  • build indexes
  • retain metadata such as source IDs, sections, versions, ACLs, and dates

2. Online flow: retrieval and generation

  • interpret the query
  • recall top-k candidates
  • rerank or fuse results
  • build the evidence pack
  • ask for a grounded answer
  • attach citations
  • log the outcome and feedback

The vector database only occupies part of that picture.
If the rest is weak, a very fast retrieval engine will not save the system.

Retrieval is only the first gate. Context building is where many systems quietly start going wrong

A lot of teams treat “did we retrieve the right chunks?” as the entire game. It matters, of course, but it is not the end of the story.

Because once chunks come back, another practical question appears:

which evidence should actually go into the prompt, in what order, with which source markers, and after how much deduplication?

That is context building. It is not especially glamorous, but it has an outsized effect on output stability.

If the evidence pack is bloated, the model loses the thread.
If it is too fragmentary, the model starts inventing glue.
Neither of those problems is solved by the vector database alone.

The job of generation is to stay tied to evidence

In a RAG setup, generation should not mean “the model saw some evidence and now gets to be creative”.

A healthier posture is usually to require things like:

  • citations
  • explicit acknowledgement when evidence is insufficient
  • restraint around cross-chunk inference
  • output constraints for high-risk use cases

That is one reason production RAG prompts often end up looking more like review instructions than open-ended writing prompts.

Evaluation and observability are not optional extras

This is the part that gets delayed most often and then reappears later as an unavoidable problem.

Without evaluation and observability, it is very difficult to answer questions such as:

  • did the chunking change actually improve anything?
  • did raising top-k improve recall or simply add noise?
  • when an answer failed, was retrieval at fault or generation?
  • after a document update, did the answer change because the source changed or because the system drifted?

That is why the difference between demo RAG and production RAG is often less about model size than about whether the team has started taking evidence, citation, metrics, and logs seriously.

When you probably do not need full RAG yet

There is an honest boundary worth stating here.

Not every use case deserves a full RAG stack on day one.
If your situation involves:

  • a small document set
  • infrequent updates
  • low risk
  • users who will click the source anyway
  • strongly structured data

then plain search plus summarisation plus explicit source linking may be enough.

RAG tends to make more sense when:

  • documents are numerous
  • documents change often
  • queries vary a lot
  • grounded answers matter
  • traceability matters

The main rules this article wants to leave behind

If I had to compress the whole piece into a few working rules, they would be these:

  1. RAG is an evidence system before it is an LLM system
  2. the vector database matters, but it is not the system by itself
  3. retrieval, context building, generation, and evaluation all matter
  4. many apparent model failures are really evidence-boundary failures
  5. production RAG problems tend to look more like data and systems engineering than prompt cleverness

What comes next

The next article takes the camera one level lower and answers a related question:

if RAG is not the same thing as a vector database, where does the vector database actually sit inside the wider database landscape?

That means moving from relational systems and document stores through to search systems, vector-native products, and vector-capable databases.