Does the Data Induce Capacity Control in Deep Learning?

You may attend the talk either in person in Walter 402 or register via Zoom. Registration is required to access the Zoom webinar.

Accepted statistical wisdom suggests that larger the model class, the more likely it is to overfit the training data. And yet, deep networks generalize extremely well. The larger the deep network, the better its accuracy on new data. This talk seeks to shed light upon this apparent paradox.

We will argue that deep networks are successful because of a characteristic structure in the space of learning tasks. The input correlation matrix for typical tasks has a peculiar (“sloppy”) eigenspectrum where, in addition to a few large eigenvalues (salient features), there are a large number of small eigenvalues that are distributed uniformly over exponentially large ranges. This structure in the input data is strongly mirrored in the representation learned by the network. A number of quantities such as the Hessian, the Fisher Information Matrix, as well as others activation correlations and Jacobians, are also sloppy. Even if the model class for deep networks is very large, there is an exponentially small subset of models (in the number of data) that fit such sloppy tasks. This talk will demonstrate the first analytical non-vacuous generalization bound for deep networks that does not use compression. We will also discuss an application of these concepts that develops new algorithms for semi-supervised learning.

References

Does the data induce capacity control in deep learning?. Rubing Yang, Jialin Mao, and Pratik Chaudhari. [ICML '22] https://arxiv.org/abs/2110.14163
Deep Reference Priors: What is the best way to pretrain a model? Yansong Gao, Rahul Ramesh, Pratik Chaudhari. [ICML '22] https://arxiv.org/abs/2202.00187

Pratik Chaudhari is an Assistant Professor in Electrical and Systems Engineering and Computer and Information Science at the University of Pennsylvania. He is a member of the GRASP Laboratory. From 2018-19, he was a Senior Applied Scientist at Amazon Web Services and a Postdoctoral Scholar in Computing and Mathematical Sciences at Caltech. Pratik received his PhD (2018) in Computer Science from UCLA, his Master's (2012) and Engineer's (2014) degrees in Aeronautics and Astronautics from MIT. He was a part of NuTonomy Inc. (now Hyundai- Aptiv Motional) from 2014—16. He received the NSF CAREER award and the Intel Rising Star Faculty Award in 2022.

Photo: https://pratikac.github.io/img/photo.jpg

Does the Data Induce Capacity Control in Deep Learning?

Share