Overfitting and Generalization

Summary

Understand what overfitting means in machine learning and how to detect and prevent it.

basic
core-ai

Overfitting is when a model performs well on training data, but poorly on unseen data.

It means the model has memorized rather than generalized.


🔍 What is Generalization?

A good model:

  • Learns patterns, not noise
  • Performs well on new, real-world data

This ability is called generalization.


📉 Example

Suppose we train a model on 100 examples.

  • It gets 98% accuracy on training data
  • But only 70% accuracy on test data

This gap suggests overfitting.


📊 Visual Intuition

  • Underfitting: Too simple, poor on both train/test
  • Good fit: Balanced performance
  • Overfitting: Too complex, great on train, bad on test

🚨 Signs of Overfitting

  • High training accuracy, low test accuracy
  • Large gap between training and validation loss
  • Model performance degrades on real inputs

🛡️ How to Prevent Overfitting

| Technique | Description | |------------------|---------------------------------------------| | More Data | Helps model see more variation | | Regularization | Penalize large weights (e.g. L2) | | Dropout | Randomly disable neurons during training | | Early Stopping | Stop training when validation loss worsens | | Simpler Models | Avoid overly complex models |


🧠 Summary

  • Overfitting = memorizing, not learning
  • Generalization = ability to perform on new data
  • Prevent with regularization, more data, and validation

✅ Self-Check

  • How do you know a model is overfitting?
  • What is the difference between underfitting and overfitting?
  • How does dropout help reduce overfitting?