RAG Day 2: How It Actually Works Under the Hood

Day 2 of Syed Jafer K’s 40-day RAG series is done. If Day 1 was about why we need RAG, today was about how it actually works. Here’s what stuck with me in plain words.

The Core Idea: Search by Meaning, Not Keywords

Old-school search matches words. RAG searches meaning.

It does this with embeddings. Embeddings means turning text into long lists of numbers (vectors) that capture its meaning. Similar ideas end up close together in this number-space, even if they use different words.

So “river bank” and “financial institution” get pulled apart, and “car” and “automobile” sit right next to each other. That’s the trick that makes RAG smart.

The RAG Flow

Here’s the pipeline in five simple steps:

User asks a question
The question gets turned into an embedding
A vector database finds the most similar chunks from your documents
Those chunks get stuffed into the LLM’s prompt as context
The LLM generates an answer based on real information

No retraining. No fine-tuning. Just retrieve, then generate.

What I Learned

Chunking is an art. Too big and you get a noisy context. Too small and you lose meaning. How you split your documents directly affects the quality of the answers.
Vector databases make it fast. Tools like FAISS, Chroma, and Pinecone can search millions of chunks in milliseconds.
External knowledge keeps things light. The LLM stays general; your data stays separate and easy to update.
Semantics beat keywords. This is what cuts down hallucinations — the model sees relevant context, not random matches.

My Takeaway

RAG isn’t just a technique — it’s a mindset. Instead of forcing a giant model to memorize everything, you give it the right information at the right time.

Day 1 was motivation. Day 2 made me want to build. Next up: loading a PDF, generating embeddings, and running my first query.

Day 2 done.

Question for you: Which RAG concept feels most confusing right now: embeddings, chunking, or vector databases? Let’s talk about it.