Day 1: What is RAG and Why do we need It?

RAG enhances LLMs by providing relevant data during question answering, overcoming limitations like hallucinations and lack of private data access.

P
Parathan Thiyagalingam
May 7, 20263 min read
Day 1: What is RAG and Why do we need It?

This blog post is a daily learning summary of my 40 Day RAG class from Syed Jaffer of Parotta Salna.

Try asking ChatGPT:

  1. "What were my company's Q3 sales numbers?"
  2. "Summarize the contract I uploaded last week."

You'll get either a confidently wrong answer (a hallucination) or "I don't have access to that." That gap is exactly what RAG (Retrieval-Augmented Generation) was built to close.

First, What is an LLM?

A Large Language Model —> GPT, Gemini, Claude, Llama is, at its core, a next-word predictor.

It was trained on huge amounts of internet text and learned one skill very well: given some words, guess what comes next.

It feels like the model "knows things" because it has seen so much text that its patterns are usually right. But it isn't about looking up facts.

Hold onto that idea. Every weakness below comes straight from it.

Why Plain LLMs Fall Short

  1. Hallucinations: Generates confident-sounding but invented facts
  2. Stale knowledge: Only knows up to its training cutoff date
  3. No private data: Has never seen your company docs, notes, or PDFs
  4. No source citations: Can't tell you where an answer came from

For anything serious like legal, medical, research, or business, these are deal-breakers.

Why Not Just Retrain the Model?

The natural question: "Can't we just train the model on our data?"

You can, but it's called fine-tuning. It's the wrong tool here. It's expensive, slow, has to be redone every time your data changes, and it's better at teaching style than facts. Fine-tuned models still hallucinate.

There's a smarter idea.

The Big Insight Behind RAG

Instead of stuffing facts into the model, hand the model the right facts as it answers.

That's the whole trick.

Plain LLM = closed-book exam (relies on memory)
RAG = open-book exam (looks things up, then answers)

The model doesn't memorize your documents. It just reads the right pages when a question comes in.

How RAG Works (The Big Picture)

There are two phases. We'll go deep on each one in the coming days.

Set up the build phase once. Every user question flows through the query phase in seconds.

When RAG is the Right Tool

Great for: customer support bots, internal Q&A over wikis, legal/medical assistants, personal knowledge tools, anything that needs private or fresh information with citations.

Not great for: pure reasoning, creative writing, math, or real-time data that changes every second.

Rule of thumb: if the question can only be answered by looking at specific documents, RAG is your friend.

Day 1 Takeaway

Plain LLMs hallucinate, go stale, and can't see your private data. RAG fixes this by giving the model the right information at the right time.

Coming Up on Day 2

We said the model "embeds" text into "numbers" and stores them in a "vector database." Tomorrow we make that concrete:

  1. What is an embedding, really?
  2. How can numbers capture meaning?
  3. Why a vector database and not just SQL?

See you on Day 2.