Day 3: Chunking — The Make-or-Break Decision in RAG
Effective RAG systems rely on optimal text chunking. Explore strategies like fixed-size, overlapping, and semantic methods for improved results.

Today, we zoom in on the step that happens before embedding: chunking. It quietly decides whether your RAG system is amazing or unusable.
You can have the best embedding model and the fanciest vector DB. Bad chunking will still ruin your RAG.
Why Chunk At All?
Why not just embed the entire document as a single giant vector? Three reasons:
- Context window limits — LLMs can't read 200-page PDFs in a single prompt.
- Retrieval precision — A "refund policy" question needs one paragraph, not the whole handbook.
- Embedding quality — Embedding a whole book averages it into vague mush. Smaller pieces = sharper meaning.
So we split. The question is how.
Every chunking strategy fights the same tradeoff:
- Too small: Loses context -> pronouns lose their referents
- Too big: Multiple ideas dilute each other → noisy retrieval
- Just right: One coherent idea per chunk
There's no universal "correct" size, but there are sensible defaults.
Strategy 1: Fixed-Size Chunking
The simplest approach. Just split text into equal pieces (e.g., 500 tokens each).
- Dead simple
- Cuts mid-sentence, mid-word, mid-thought
Tip: Always count by tokens, not characters. Tokens are how LLMs and embedding models actually perceive text.
Strategy 2: Overlapping Chunks (Sliding Window)
Same as fixed-size, but each chunk overlaps the next.
If an important sentence falls right at a boundary, it still appears fully in at least one chunk. Safety net.
Typical settings: 500-token chunks with 50–100 token overlap.
- Preserves context across boundaries
- The best beginner default
- Slightly more storage (overlap is embedded twice)
Strategy 3: Semantic Chunking
Split where the meaning changes, not at arbitrary intervals.
How: break the doc into sentences, embed each one, and start a new chunk whenever similarity to the previous sentence drops sharply (a topic shift).
- Highest-quality, most coherent chunks-
- Slower and more expensive (you're embedding everything to chunk it)
One Bonus Should Know: Recursive Splitting
The default in libraries like LangChain. It tries to split on natural separators in order of preference: paragraphs → sentences → words → characters.
It's almost as fast as fixed-size and far smarter. Try this before reaching for semantic chunking. It's a sweet spot for most projects.
A Practical Starter Recipe
If you're building your first RAG system tomorrow:
Iterate from there based on actual retrieval results.
Common Beginner Mistakes
- Splitting by character count instead of tokens. Token counts are what actually matter to the model.
- No overlap on dense technical content. Adjacent sentences reference each other constantly. Without overlap, those connections vanish.
- Never inspecting actual chunks. Always print samples before embedding. You'll catch most issues in 30 seconds.
Notes:
Why do we chunk documents?
3 reasons:
- LLM context windows are limited
- Smaller chunks improve retrieval precision
- Embeddings degrade in quality on very long text.
Fixed-size vs overlapping vs semantic chunking?
- Fixed-size: equal pieces, fast but breaks sentences.
- Overlapping: fixed-size with overlap, preserves cross-boundary context. Best beginner default.
- Semantic: splits at meaning shifts, highest quality, and more expensive.
What's a sensible default? Recursive splitting (or fixed-size + overlap), ~500 tokens with ~50–100 token overlap.
Why use overlap? So info that lands near a chunk boundary still appears fully in at least one chunk.
Why count tokens instead of characters? Tokens are how LLMs and embedding models actually count text; character counts can wildly mislead.
Day 3 Takeaway
Chunking is where data prep meets retrieval quality. Start simple — fixed-size + overlap or recursive splitting — inspect your actual chunks, and only get fancier if results demand it.
Coming Up on Day 4
In the next session, we will check PDF processing.
See you on Day 4.
