Epochs, Batches and Learning Rate

Training a model means adjusting weights over time, not all at once.

This is done through multiple epochs, on smaller batches, with a specific learning rate.

⏱️ Epoch

An epoch is one full pass through the entire training dataset.

We train in multiple epochs to improve learning.
One epoch doesn’t mean the model has learned enough.

📦 Batch

Instead of feeding all data at once, we split it into mini-batches.

Why?

Fits in memory
Adds randomness → helps generalization

Batch Size affects:

Speed
Stability
Accuracy

🎚️ Learning Rate

The learning rate (α) determines how big the steps are during gradient descent.

Too high → skips minima
Too low → slow or stuck

Learning Rate Effects

See how different learning rates affect convergence speed and stability

Loss Over Time

Parameter Space

Learning Rate Comparison

Too Low

Very slow convergence

Learning Rate:0.001

Current x:2.0000

Current Loss:4.0000

Epoch:1

Good

Optimal convergence

Learning Rate:0.1

Current x:2.0000

Current Loss:4.0000

Epoch:1

Too High

Oscillates or diverges

Learning Rate:0.5

Current x:2.0000

Current Loss:4.0000

Epoch:1

Training Progress

Epoch 1 / 502%

Learning Rate Analysis

Too Low (0.001)

Very slow convergence. Takes many epochs to reach minimum.

Good (0.1)

Optimal convergence speed. Reaches minimum efficiently.

Too High (0.5)

Oscillates around minimum. May never converge properly.

Key Insights

• Too low: Wastes computation time

• Too high: May never converge

• Optimal: Balances speed and stability

• Adaptive: Modern optimizers adjust automatically

• Schedule: Often decrease over time

Final Results

Learning Rate	Final Loss	Convergence	Stability
Too Low (0.001)	3.1978	Slow	High
Good (0.1)	0.0000	Good	High
Too High (0.5)	0.0000	Poor	Low

🔁 Training Loop

The full training loop typically looks like:

Shuffle dataset
Loop through mini-batches
Forward pass
Compute loss
Backward pass
Update weights
Repeat for multiple epochs

⚖️ Trade-Offs

Concept	High Value	Low Value
Batch Size	Fast, unstable	Slow, stable
Epochs	Better learning	Underfitting risk
Learning Rate	Fast convergence	Slower learning

Self-Check

What is an epoch?
Why use batches instead of the full dataset?
What happens if the learning rate is too high?

⏱️ Epoch

📦 Batch

Why?

🎚️ Learning Rate

Learning Rate Effects

Loss Over Time

Parameter Space

Learning Rate Comparison

Too Low

Good

Too High

Training Progress

Learning Rate Analysis

Too Low (0.001)

Good (0.1)

Too High (0.5)

Key Insights

Final Results

🔁 Training Loop

⚖️ Trade-Offs

Self-Check

← Previous Lesson

Next Lesson →

Explore More Learning

Learning Rate Effects

Loss Over Time

Parameter Space

Learning Rate Comparison

Too Low

Good

Too High

Training Progress

Learning Rate Analysis

Too Low (0.001)

Good (0.1)

Too High (0.5)

Key Insights

Final Results