Why Language Models Hallucinate: A summary of a new research paper from OpenAI
AI hallucinations: OpenAI research reveals why language models confidently invent facts. Training & testing methods reward guessing, not accuracy.

In this post, I would like to share my understanding and the summary I got from the latest research paper published by OpenAI on “Why Language Model Hallucinate“.
AI language models confidently make up facts that sound believable but are completely wrong. They might claim Einstein was born in 1880 (he wasn’t) or invent fake research papers with convincing titles.
There are 2 main reasons explained in the paper:
1. Training Makes Hallucination Inevitable
The Math Problem: Creating correct text is much harder than just recognizing if text is correct.
- Think of it like writing a perfect essay vs. grading someone else’s essay
- The researchers proved: If an AI can only identify correct answers 90% of the time, it will generate wrong answers at least 20% of the time
- This happens even with perfect training data
What Makes It Worse:
- Rare facts: Birthdays, phone numbers, or obscure details that appear only once in training data are almost impossible to learn correctly
- Wrong tool for the job: Like using a calculator designed for addition to do complex geometry
- Bad training data: Errors in the original training material get copied by the AI
- Hard problems: Some questions are just mathematically difficult to solve
2. Our Testing Methods Reward Lying
The Exam Problem: Most AI tests work like multiple choice exams where:
- Correct answer = 1 point
- Wrong answer = 0 points
- “I don’t know” = 0 points
Under these rules, guessing is always better than being honest about uncertainty. The researchers found that major AI benchmarks like GPQA, MMLU, and SWE-bench all give zero credit for saying “I don’t know.” A model that always guesses will score higher than one that honestly admits when it’s uncertain.
A proposed method:
Change how we score AI systems. Instead of binary right/wrong, use explicit confidence thresholds:
Example: “Only answer if you’re more than 75% confident. Wrong answers lose 3 points, correct answers gain 1 point, and ‘I don’t know’ gets 0 points. There are additional scoring points also proposed.”
This makes the AI think: “Am I confident enough to risk the penalty?” Just like how some human exams penalize wrong answers to discourage wild guessing.
Why This Matters
Current Reality:
- Medical AI that can’t say “see a specialist”
- Legal AI that invents fake court cases
- Educational AI that teaches wrong information confidently
Better Future:
- AI that admits uncertainty when appropriate
- More trustworthy responses overall
- Users can distinguish between confident knowledge and educated guesses
Hallucinations aren’t a mysterious AI failure – they’re a predictable result of how we train and test these systems. The solution isn’t just better AI models; it’s better evaluation methods that reward honesty over overconfidence.
We accidentally taught AI to be overconfident by rewarding guessing in our tests. We can fix this by changing how we keep score.