Gradient descent is the core optimization algorithm used to train most machine learning models.
It helps the model learn by minimizing the loss function step by step.
🧗 Intuition
Imagine a hiker trying to descend a mountain (loss function) in the fog:
- The height = error (loss)
- The direction = gradient (slope)
- Each step = model update
The hiker wants to reach the bottom — minimum error.
🧮 How It Works
At each step:
- Calculate the loss.
- Compute the gradient (slope of the loss curve).
- Take a step in the opposite direction of the gradient.
- Update the model's parameters.
🔧 Formula (Simplified)
Let θ
be the model’s parameter.
Update rule:
θ = θ - η * ∇L(θ)
Where:
η
is the learning rate∇L(θ)
is the gradient of the loss function
📉 Example
- Prediction too high? Decrease the weight.
- Prediction too low? Increase the weight.
Over time, the model “nudges” itself to better performance.
⚠️ Learning Rate Matters
- Too small → Slow learning
- Too big → Might overshoot the minimum
- Choose carefully!
🧠 Summary
| Concept | Meaning | |----------------|----------------------------------| | Gradient | Slope of the loss curve | | Descent | Move in direction of lower error | | Learning Rate | Size of the step | | Goal | Minimize the loss |
✅ Self-Check
- What does gradient descent try to minimize?
- Why is the learning rate important?
- How do gradients help the model learn?