Colloquium: On the Foundations of Deep Learning: Over-parameterization, Generalization, and Representation Learning

The computer science colloquium takes place on Mondays from 11:15 a.m. - 12:15 p.m.

This week's speaker, Wei Hu (Princeton University), will be giving a talk titled "On the Foundations of Deep Learning: Over-parameterization, Generalization, and Representation Learning".

Abstract

Despite the phenomenal empirical successes of deep learning in many application domains, its underlying mathematical mechanisms remain poorly understood. Mysteriously, deep neural networks in practice can often fit training data perfectly and generalize remarkably well to unseen test data, despite highly non-convex optimization landscapes and significant over-parameterization. Moreover, deep neural networks show extraordinary ability to perform representation learning: feature representation extracted from a neural network can be useful for other related tasks.

In this talk, I will present our recent progress on the theoretical foundations of deep learning. First, I will show that gradient descent on deep linear neural networks induces an implicit regularization effect towards low rank, which explains the surprising generalization behavior of deep linear networks for the low-rank matrix completion problem. Next, turning to nonlinear deep neural networks, I will talk about a line of studies on wide neural networks, where by drawing a connection to the neural tangent kernels, we can answer various questions such as how training loss is minimized, why trained network can generalize, and why certain component in the network architecture is useful; we also use theoretical insights to design a new simple and effective method for training on noisily labeled datasets. Finally, I will analyze the statistical aspect of representation learning, and identify conditions that enable efficient use of training data, bypassing a known hurdle in the i.i.d. tasks setting.

Biography

Wei Hu is a PhD candidate in the Department of Computer Science at Princeton University, advised by Sanjeev Arora. Previously, he obtained his B.E. in Computer Science from Tsinghua University. He has also spent time as a research intern at research labs of Google and Microsoft. His current research interest is broadly in the theoretical foundations of modern machine learning. In particular, his main focus is on obtaining solid theoretical understanding of deep learning, as well as using theoretical insights to design practical and principled machine learning methods.

Colloquium: On the Foundations of Deep Learning: Over-parameterization, Generalization, and Representation Learning

Abstract

Biography

Share