Overparametrization in machine learning: insights from linear models
Data Science Seminar
Andrea Montanari (Stanford University)
Abstract
Deep learning models are often trained in a regime that is forbidden by classical statistical learning theory. The model complexity can be larger than the sample size and the train error does not concentrate around the test error. In fact, the model complexity can be so large that the network interpolates noisy training data. Despite this, it behaves well on fresh test data, a phenomenon that has been dubbed `benign overfitting.'
I will review recent progress towards a precise quantitative understanding of this phenomenon in linear models and kernel regression. In particular, I will present a recent characterization of ridge regression in Hilbert spaces which provides a unified understanding on several earlier results.
[Based on joint work with Chen Cheng]