Abstract
In this tutorial I will mostly survey classical, mostly 20th century, statistical learning theory, focusing on generalization by controlling capacity. We will discuss:
- Vapnik and Chervonenkis's Fundamental Theorem of Learning
- Scale sensitive capacity control and marking
- Minimum Description Length / Occam's Rule / Structural Risk Minimization and PAC-Bayes
- Parallels with Stochastic Optimization
- Generalization and capacity control from optimization: online-to-batch, stochastic approximation, boosting, min norm and max margin.
We will ask how the classic theory fits with current interests, including interpolation learning, benign oversitting and implicit bias.