Overfitting is when a model performs well on training data, but poorly on unseen data.
It means the model has memorized rather than generalized.
🔍 What is Generalization?
A good model:
- Learns patterns, not noise
- Performs well on new, real-world data
This ability is called generalization.
📉 Example
Suppose we train a model on 100 examples.
- It gets 98% accuracy on training data
- But only 70% accuracy on test data
This gap suggests overfitting.
📊 Visual Intuition
- Underfitting: Too simple, poor on both train/test
- Good fit: Balanced performance
- Overfitting: Too complex, great on train, bad on test
🚨 Signs of Overfitting
- High training accuracy, low test accuracy
- Large gap between training and validation loss
- Model performance degrades on real inputs
🛡️ How to Prevent Overfitting
| Technique | Description | |------------------|---------------------------------------------| | More Data | Helps model see more variation | | Regularization | Penalize large weights (e.g. L2) | | Dropout | Randomly disable neurons during training | | Early Stopping | Stop training when validation loss worsens | | Simpler Models | Avoid overly complex models |
🧠 Summary
- Overfitting = memorizing, not learning
- Generalization = ability to perform on new data
- Prevent with regularization, more data, and validation
✅ Self-Check
- How do you know a model is overfitting?
- What is the difference between underfitting and overfitting?
- How does dropout help reduce overfitting?