Chapter 10: Sequence Models & Attention
Chapter Overview
RNN/LSTM/GRU · Seq2Seq & Attention · Transformer architecture · BERT & GPT · State Space Models · Efficient architectures
Sections
1
Recurrent Models: RNN, LSTM, GRU
Historical foundations
2
Seq2Seq & the Birth of Attention
The bottleneck problem
3
Self-Attention & the Transformer
The architecture that changed ML
4
BERT & GPT: The Two Paradigms
Encoder vs decoder
5
State Space Models & Mamba
The emerging challenger
6
Interpretability & Efficient Architectures
Making it scale
7
Nice to Know
Interesting but not essential