RAG Day 2: How It Actually Works Under the Hood
RAG deep dive: Explore how Retrieval-Augmented Generation works. Learn about embeddings, vector databases & chunking for smarter search.
Day 2 of Syed Jafer K’s 40-day RAG series is done. If Day 1 was about why we need RAG, today was about how it actually works. Here’s what stuck with me in plain words.
The Core Idea: Search by Meaning, Not Keywords
Old-school search matches words. RAG searches meaning.
It does this with embeddings. Embeddings means turning text into long lists of numbers (vectors) that capture its meaning. Similar ideas end up close together in this number-space, even if they use different words.
So “river bank” and “financial institution” get pulled apart, and “car” and “automobile” sit right next to each other. That’s the trick that makes RAG smart.
The RAG Flow
Here’s the pipeline in five simple steps:
- User asks a question
- The question gets turned into an embedding
- A vector database finds the most similar chunks from your documents
- Those chunks get stuffed into the LLM’s prompt as context
- The LLM generates an answer based on real information
No retraining. No fine-tuning. Just retrieve, then generate.
What I Learned
- Chunking is an art. Too big and you get a noisy context. Too small and you lose meaning. How you split your documents directly affects the quality of the answers.
- Vector databases make it fast. Tools like FAISS, Chroma, and Pinecone can search millions of chunks in milliseconds.
- External knowledge keeps things light. The LLM stays general; your data stays separate and easy to update.
- Semantics beat keywords. This is what cuts down hallucinations — the model sees relevant context, not random matches.
My Takeaway
RAG isn’t just a technique — it’s a mindset. Instead of forcing a giant model to memorize everything, you give it the right information at the right time.
Day 1 was motivation. Day 2 made me want to build. Next up: loading a PDF, generating embeddings, and running my first query.
Day 2 done.
Question for you: Which RAG concept feels most confusing right now: embeddings, chunking, or vector databases? Let’s talk about it.