Backpropagation
The algorithm that powers deep learning: computing gradients through neural networks
The Chain Rule in Action
Forward Pass
Input data flows through the network layer by layer. Each neuron computes a weighted sum of inputs, applies an activation function, and passes the result to the next layer.
Backward Pass
Gradients flow backward from the loss function. Using the chain rule, we compute how much each weight contributed to the error, enabling precise weight updates.
Interactive Visualization
Watch how signals propagate forward through the network (blue) and how gradients flow backward (red). Click the buttons to run forward pass, backward pass, or a complete training step.
Algorithm Steps
- 1Forward PropagationCompute activations layer by layer from input to output
- 2Compute LossCalculate the error between prediction and target
- 3Backward PropagationCompute gradients using the chain rule, layer by layer
- 4Update WeightsAdjust weights in the direction that reduces loss
Key Concepts
Gradient Descent
w = w - η · ∂Loss/∂w
Weights are updated in the opposite direction of the gradient
Learning Rate (η)
Controls step size. Too high causes overshooting, too low causes slow convergence.
Vanishing Gradients
Gradients can shrink exponentially in deep networks, solved by ReLU, residual connections, and normalization.
Mathematical Foundation
Forward Pass Equations
Backward Pass Equations
Why Backpropagation Matters
Backpropagation, formalized by Rumelhart, Hinton, and Williams in 1986, is the cornerstone of modern deep learning. It efficiently computes gradients in O(n) time, where n is the number of parameters.
Without backpropagation, training neural networks with millions or billions of parameters would be computationally infeasible. It enables:
- Large Language Models like GPT and Claude
- Computer Vision networks like ResNet and Vision Transformers
- Generative AI including diffusion models and GANs
- Reinforcement Learning algorithms like PPO and SAC