Gradients and Derivatives: Backpropagation Deep Dive

To fully understand backpropagation, we need to explore how gradients flow using calculus.

🧮 Derivatives and Chain Rule

The chain rule is key to computing gradients in neural networks:

∂L/∂x = ∂L/∂z × ∂z/∂x

Where L is the loss, z is an intermediate variable, and x is a weight or activation.

In neural networks, we apply this recursively through layers.

🔁 Example: Two-layer Network

Let’s say we have:

z₁ = w₁x + b₁ → a₁ = ReLU(z₁)
z₂ = w₂a₁ + b₂ → ŷ = sigmoid(z₂)

Loss: L = MSE(ŷ, y)

To compute the gradient of L w.r.t. w₁:

∂L/∂ŷ
∂ŷ/∂z₂
∂z₂/∂a₁
∂a₁/∂z₁
∂z₁/∂w₁

📉 Matrix Form

In vectorized networks, backpropagation uses matrix calculus.

∂L/∂W = ∂L/∂Z × ∂Z/∂W

Frameworks like PyTorch or TensorFlow handle this automatically using autograd.

🧠 Intuition

Backprop tells us how a small change in a weight affects the final loss.

The goal of gradient descent is to follow the negative gradient to minimize loss.

⚙️ Gradient Flow Issues

Vanishing gradients: Sigmoid/Tanh cause very small derivatives
Exploding gradients: Can happen in deep networks
Solutions: ReLU, LayerNorm, Residual connections

📊 Visualization

Gradient Flow in Neural Networks

Watch how gradients flow backward through the network during training

Show Gradients

Current Step: Forward Pass

Node Values

i1:0.8000

i2:0.3000

h1:0.0000

h2:0.0000

h3:0.0000

o1:0.0000

Gradients

How Gradient Flow Works

Forward Pass: Input values flow through the network to produce output

Loss Calculation: Compare output with target to compute error

Backward Pass: Gradients flow backward to show how each weight affects the loss

Weight Updates: Adjust weights in the direction that reduces loss

This shows how errors move backward and accumulate.

Self-Check

What does the chain rule allow us to compute?
What causes vanishing gradients?
How do frameworks calculate gradients?

🧮 Derivatives and Chain Rule

🔁 Example: Two-layer Network

📉 Matrix Form

🧠 Intuition

⚙️ Gradient Flow Issues

📊 Visualization

Gradient Flow in Neural Networks

Current Step: Forward Pass

Node Values

Gradients

How Gradient Flow Works

Self-Check

← Previous Lesson

Next Lesson →

Explore More Learning

Gradient Flow in Neural Networks

Current Step: Forward Pass

Node Values

Gradients

How Gradient Flow Works