RAG Engineering in Practice 01 – RAG Is More Than a Vector Database: Start with the Full System Map

When people first get into RAG, their attention often goes straight to the vector database. That is understandable. It is usually the part that sounds the most new, the most technical, and the most likely to impress someone in a meeting.

But once you actually build a system, one thing becomes hard to miss:

the vector database is often only one leg of the machine. RAG is the whole machine.

This article is meant to lay out that map properly. What problem RAG is really solving, where the boundary sits between RAG and a vector database, what a typical pipeline actually contains, and why so many apparent “model issues” turn out to be evidence, context, or evaluation problems in disguise.

What problem RAG is really solving

If you think of a large language model as a colleague who writes quickly, summarises well, and speaks with great confidence, its main weakness is usually not that it cannot talk. It is that it often does not know what it does not know.

In real work, that tends to show up in ways like these:

the documents have changed, but the model is still leaning on stale knowledge
the answer you need lives in internal SOPs, tickets, contracts, or notes rather than on the open web
you need something that can justify itself, but the model is perfectly happy sounding certain without much basis

That is where RAG earns its keep. Not by making the model more eloquent, but by giving it retrievable evidence before it starts generating.

The core claim

If I had to start with one line, it would be this:

RAG is not a vector database. A vector database is usually one retrieval component inside a broader system that includes retrieval, context building, constrained generation, and evaluation.

That distinction matters. Because if you begin by treating RAG as “split the text, embed it, store it in a vector database, then ask the model a question”, you tend to walk straight into a whole category of avoidable failures.

A stricter definition of RAG

The short version is:

RAG = retrieve first, generate second.

But if you are trying to build something reliable, I think the more useful definition is:

a RAG system = retrieval + context building + grounded generation + evaluation / observability

If any one of those is weak, the system may still appear to function. It just won’t be especially trustworthy.

retrieval decides which evidence comes back
context building decides what actually goes into the prompt and in what shape
grounded generation keeps the model tied to evidence rather than letting it improvise too freely
evaluation / observability tells you whether the system is improving and where it fails

Why RAG is not the same thing as a vector database

A vector database solves a fairly specific problem:

how do I store vectors and search for nearby vectors efficiently?

RAG solves a broader one:

how do I make a model answer from evidence rather than from vague parametric memory and fluent guesswork?

Those two things overlap, but they are not the same.

You can absolutely use a vector database inside a RAG system.
But you can also involve:

BM25 or inverted indexes
hybrid retrieval
structured queries such as SQL or graph lookups
reranking stages after recall

So the cleaner mental model is:

RAG is a method and a system
a vector database is one possible implementation choice inside the retrieval layer

What a typical RAG pipeline looks like

A practical RAG pipeline normally has two broad halves:

1. Offline flow: ingestion and indexing

collect documents
clean and normalise them
chunk them
embed them
build indexes
retain metadata such as source IDs, sections, versions, ACLs, and dates

2. Online flow: retrieval and generation

interpret the query
recall top-k candidates
rerank or fuse results
build the evidence pack
ask for a grounded answer
attach citations
log the outcome and feedback

The vector database only occupies part of that picture.
If the rest is weak, a very fast retrieval engine will not save the system.

Retrieval is only the first gate. Context building is where many systems quietly start going wrong

A lot of teams treat “did we retrieve the right chunks?” as the entire game. It matters, of course, but it is not the end of the story.

Because once chunks come back, another practical question appears:

which evidence should actually go into the prompt, in what order, with which source markers, and after how much deduplication?

That is context building. It is not especially glamorous, but it has an outsized effect on output stability.

If the evidence pack is bloated, the model loses the thread.
If it is too fragmentary, the model starts inventing glue.
Neither of those problems is solved by the vector database alone.

The job of generation is to stay tied to evidence

In a RAG setup, generation should not mean “the model saw some evidence and now gets to be creative”.

A healthier posture is usually to require things like:

citations
explicit acknowledgement when evidence is insufficient
restraint around cross-chunk inference
output constraints for high-risk use cases

That is one reason production RAG prompts often end up looking more like review instructions than open-ended writing prompts.

Evaluation and observability are not optional extras

This is the part that gets delayed most often and then reappears later as an unavoidable problem.

Without evaluation and observability, it is very difficult to answer questions such as:

did the chunking change actually improve anything?
did raising top-k improve recall or simply add noise?
when an answer failed, was retrieval at fault or generation?
after a document update, did the answer change because the source changed or because the system drifted?

That is why the difference between demo RAG and production RAG is often less about model size than about whether the team has started taking evidence, citation, metrics, and logs seriously.

When you probably do not need full RAG yet

There is an honest boundary worth stating here.

Not every use case deserves a full RAG stack on day one.
If your situation involves:

a small document set
infrequent updates
low risk
users who will click the source anyway
strongly structured data

then plain search plus summarisation plus explicit source linking may be enough.

RAG tends to make more sense when:

documents are numerous
documents change often
queries vary a lot
grounded answers matter
traceability matters

The main rules this article wants to leave behind

If I had to compress the whole piece into a few working rules, they would be these:

RAG is an evidence system before it is an LLM system
the vector database matters, but it is not the system by itself
retrieval, context building, generation, and evaluation all matter
many apparent model failures are really evidence-boundary failures
production RAG problems tend to look more like data and systems engineering than prompt cleverness

What comes next

The next article takes the camera one level lower and answers a related question:

if RAG is not the same thing as a vector database, where does the vector database actually sit inside the wider database landscape?

That means moving from relational systems and document stores through to search systems, vector-native products, and vector-capable databases.