Chapter 5: Supervised Learning

9 sections Prerequisites: ML Fundamentals & Core Concepts

Chapter Overview

Supervised learning is the workhorse of applied ML — you have inputs, you have labels, and you want to learn the mapping. This chapter builds from the simplest linear models through decision trees and ensembles to SVMs and time series, giving you a practical toolkit for both regression and classification. The progression: start simple, add complexity only when the data demands it.

Which Algorithm When — Master Comparison

This is the single most useful reference in the chapter. Bookmark it. When you're starting a new problem, scan this table to pick your first model.

Algorithm	Best For	Non-linearity?	Scaling Needed?	Interpretable?	Speed (Train / Predict)	Try First When…
Linear Regression	Regression with roughly linear relationships	No	Yes (for regularized)	★★★ High	Fast / Fast	You need a regression baseline or interpretability is paramount
Logistic Regression	Binary/multiclass classification, probability estimates	No	Yes	★★★ High	Fast / Fast	You need a classification baseline or want to understand feature effects
KNN	Small datasets, prototyping, non-parametric baselines	Yes (implicitly)	Yes (critical)	★★☆ Medium	None / Slow	Dataset is small, you want a quick sanity check, or local patterns dominate
Naive Bayes	Text classification, high-dimensional sparse data	No	No	★★☆ Medium	Very Fast / Very Fast	You have text data, need a fast baseline, or have very little training data
Decision Tree	Exploratory analysis, understanding feature interactions	Yes	No	★★★ High	Fast / Fast	You need a fully interpretable model or want to visualize the decision logic
Random Forest	General-purpose tabular data, robust default	Yes	No	★★☆ Medium	Medium / Medium	You want a strong model with minimal tuning — the "hard to mess up" choice
XGBoost / LightGBM	Tabular data competitions, maximum predictive performance	Yes	No	★☆☆ Low	Medium / Fast	You need top accuracy on structured data and can tune hyperparameters
SVM	Small-to-medium datasets, clear margins, text classification	Yes (with kernels)	Yes (critical)	★☆☆ Low	Slow / Medium	Dataset is small with clean separation, or you're doing text/image classification with kernel tricks

The Practical Decision Flowchart

When you're staring at a new supervised learning problem, here's the order of operations:

Start with logistic regression (classification) or linear regression (regression) as your baseline. Seriously. It's fast, interpretable, and tells you how far a simple model gets. If it hits 90% of your target metric, you might not need anything fancier.
If the baseline isn't enough, try Random Forest. It handles non-linearity, requires no feature scaling, is robust to outliers, and has good defaults out of the box. It's the "hard to mess up" model.
If you need more performance on tabular data, graduate to LightGBM or XGBoost with early stopping. This is where Kaggle winners live. Expect 1–5% improvement over Random Forest with proper tuning, but budget time for hyperparameter search.
SVMs for small datasets with clear margins or text classification. When you have fewer than ~10k samples and the data has nice geometric structure, SVMs with RBF or polynomial kernels can outperform tree methods.
KNN and Naive Bayes for specific niches. KNN for quick prototyping and sanity checks. Naive Bayes when you have text data and need something fast with minimal training data. Neither is likely your final production model, but both are valuable diagnostic tools.

💡 The Real Secret

Algorithm choice typically accounts for the last 5–10% of performance. Feature engineering, data quality, and proper validation strategy matter far more. A well-engineered logistic regression frequently beats a poorly-tuned XGBoost.

Sections

Linear Models

Simple Classifiers

Decision Trees

Splitting criteria, pruning, feature importance, visualization

Ensemble Methods

Bagging, Random Forest, Boosting (AdaBoost, XGBoost, LightGBM), Stacking

Support Vector Machines

Margins, kernel trick, RBF, soft margins, SVR

Time Series Forecasting

Walk-forward validation, lag features, ARIMA, ML for time series

Nice to Know

GLMs, survival analysis, quantile regression, online learning

← Previous Chapter Ch 4: ML Fundamentals & Core Concepts Next Chapter → Ch 6: Unsupervised Learning