Chapter 10: Sequence Models & Attention

290 concepts 7 sections Prerequisites: Training Deep Networks

Chapter Overview

RNN/LSTM/GRU · Seq2Seq & Attention · Transformer architecture · BERT & GPT · State Space Models · Efficient architectures

Sections

Recurrent Models: RNN, LSTM, GRU

Historical foundations

Seq2Seq & the Birth of Attention

The bottleneck problem

Self-Attention & the Transformer

The architecture that changed ML

BERT & GPT: The Two Paradigms

Encoder vs decoder

State Space Models & Mamba

The emerging challenger

Interpretability & Efficient Architectures

Making it scale

Interesting but not essential

← Previous Chapter Ch 9: CNNs & Computer Vision Next Chapter → Ch 11: Natural Language Processing