Chapter 8: Training Deep Networks
Chapter Overview
Optimizers & schedules · Normalization & regularization · PyTorch training loops · Multi-GPU scaling · Debugging training
Sections
1
Optimization: Algorithms & Schedules
SGD · Adam · AdamW · Cosine annealing · Warmup
2
Training Techniques: Normalization, Regularization & Gradients
BatchNorm · LayerNorm · Dropout · Weight decay · Gradient clipping
3
PyTorch Training
Autograd · Training loop · Dataset · nn.Module · Checkpoints
4
Frameworks: TensorFlow, Keras & JAX
Keras fit() · JAX transforms · Framework comparison
5
Training Debugging & Stability
Loss debugging · Sanity checks · Monitoring · Tools
6
Scaling & Efficiency
7
Nice to Know
SWA · EMA · Distillation · Gradient checkpointing · torch.compile