Abstract

How do overparameterized deep learning models avoid overfitting? Why do deep neural networks work better in practice than classical methods, e.g., kernels / splines?  The talk covers a recent line of research that inspects DNNs from the classical non-parametric regression (or “curve fitting”) which reveals that the reason why DNNs work better might be due to its adaptivity when we tune its standard hyperparameters, which implicitly discovers hidden sparsity and low-dimensional structures. I will go over theory and examples to illustrate this point.  The results provide new insight on overparameterization, representation learning, and how neural networks generalize (often adaptively and nearly optimally) through optimization-algorithm induced implicit bias such as Edge-of-Stability and Minima Stability.

Attachment

Video Recording