Past events

Priority deadline for 2021 Grace Hopper Celebration tickets

Every year, the Department of Computer Science & Engineering has a major presence at the Grace Hopper Celebration (GHC), the world’s largest gathering of women technologists.

The 2021 GHC event will be held from September 27 to October 1. We invite current students to request a department funded ticket to attend this year's event.

Interested students should fill out the interest form by Friday, July 9 at noon. Keep in mind that filling out the form does not guarantee a ticket. Also, please note that students who are provided departmental tickets to the virtual Grace Hopper Celebration 2021 will still have the opportunity to request funding for a future in-person Grace Hopper Celebration.

Departmental staff will contact students who filled out the form about their ticket status sometime after July 9, 2021.

Please feel free to reach out to Allison Small if you have any questions.

MSSE Online Information Session

Have all your questions about the Master of Science in Software Engineering (MSSE) program answered by attending this online information session.

RSVP now to reserve your spot.

Attendees will be sent a link prior to the event.

UMN Machine Learning Seminar

The UMN Machine Learning Seminar Series brings together faculty, students, and local industrial partners who are interested in the theoretical, computational, and applied aspects of machine learning, to pose problems, exchange ideas, and foster collaborations. The talks are every Thursday from 12 p.m. - 1 p.m. during the Summer 2021 semester.

This week's speaker, Rohan Anil (Google Brain) will be giving a talk titled "Scalable Second-Order Optimization for Deep Learning."

Abstract

Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent. Second-order optimization methods, that involve second derivatives and/or second order statistics of the data, are far less prevalent despite strong theoretical properties, due to their prohibitive computation, memory and communication costs. In an attempt to bridge this gap between theoretical and practical optimization, we present a scalable implementation of a second-order preconditioned method (concretely, a variant of full-matrix Adagrad), that along with several critical algorithmic and numerical improvements, provides significant convergence and wall-clock time improvements compared to conventional first-order methods on state-of-the-art deep models. Our novel design effectively utilizes the prevalent heterogeneous hardware architecture for training deep models, consisting of a multicore CPU coupled with multiple accelerator units. We demonstrate superior performance compared to state-of-the-art on very large learning tasks such as machine translation with Transformers, language modeling with BERT, click-through rate prediction on Criteo, and image classification on ImageNet with ResNet-50.

Biography

Rohan Anil is a Senior Staff Software Engineer, Google Research, Brain Team. Lately, he has been working on scalable and practical optimization techniques for efficient training of neural networks in various regimes.

University closed

The University of Minnesota will be closed in observance of Independence Day.

View the full schedule of University holidays.
 

UMN Machine Learning Seminar

The UMN Machine Learning Seminar Series brings together faculty, students, and local industrial partners who are interested in the theoretical, computational, and applied aspects of machine learning, to pose problems, exchange ideas, and foster collaborations. The talks are every Thursday from 12 p.m. - 1 p.m. during the Summer 2021 semester.

This week's speaker, Brian Kulis (Boston University) will be giving a talk titled "New Directions in Metric Learning."

Abstract

Metric learning is a supervised machine learning problem concerned with learning a task-specific distance function from supervised data. It has found numerous applications in problems such as similarity search, clustering, and ranking. Much of the foundational work in this area focused on the class of so-called Mahalanobis metrics, which may be viewed as Euclidean distances after linear transformations of the data. This talk will describe two recent directions in metric learning: deep metric learning and divergence learning. The first replaces the linear transformations with the output of a neural network, while the second considers a broader class than Mahalanobis metrics. I will discuss some of my recent work along both of these fronts, as well as ongoing attempts to combine these approaches together using a novel framework called deep divergences.

Biography

Brian Kulis is an associate professor at Boston University, with appointments in the Department of Electrical and Computer Engineering, the Department of Computer Science, the Faculty of Computing and Data Sciences, and the Division of Systems Engineering. He also is an Amazon Scholar, working with the Alexa team. Previously he was the Peter J. Levine Career Development assistant professor at Boston University. Before joining Boston University, he was an assistant professor in Computer Science and in Statistics at Ohio State University, and prior to that was a postdoctoral fellow at UC Berkeley EECS. His research focuses on machine learning, statistics, computer vision, and large-scale optimization. He obtained his PhD in computer science from the University of Texas in 2008, and his BA degree from Cornell University in computer science and mathematics in 2003. For his research, he has won three best paper awards at top-tier conferences---two at the International Conference on Machine Learning (in 2005 and 2007) and one at the IEEE Conference on Computer Vision and Pattern Recognition (in 2008). He is also the recipient of an NSF CAREER Award in 2015, an MCD graduate fellowship from the University of Texas (2003-2007), and an Award of Excellence from the College of Natural Sciences at the University of Texas.

UMN Machine Learning Seminar

The UMN Machine Learning Seminar Series brings together faculty, students, and local industrial partners who are interested in the theoretical, computational, and applied aspects of machine learning, to pose problems, exchange ideas, and foster collaborations. The talks are every Thursday from 12 p.m. - 1 p.m. during the Summer 2021 semester.

This week's speaker, Michael Overton (Courant Institute of Mathematical Sciences, NYU) will be giving a talk titled "Nonsmooth, Nonconvex Optimization: Algorithms and Examples."

Abstract

In many applications one wishes to minimize an objective function that is not convex and is not differentiable at its minimizes. We discuss two algorithms for minimization of general nonsmooth, nonconvex functions. Gradient Sampling is a simple method that, although computationally intensive, has a nice convergence theory. The method is robust and the convergence theory has been extended to constrained problems. BFGS is a well known method, developed for smooth problems, but which is remarkably effective for nonsmooth problems too. Although our theoretical results in the nonsmooth case are quite limited, we have made extensive empirical observations and have had broad success with BFGS in nonsmooth applications. Limited Memory BFGS is a popular extension for large-scale problems, but we show that, in contrast to BFGS, it sometimes converges to non-optimal nonsmooth points. Throughout the talk we illustrate the ideas through examples, some very easy and some very challenging.

Biography

Michael L. Overton is Silver Professor of Computer Science and Mathematics at the Courant Institute of Mathematical Sciences, New York University. He received his B.Sc. in Computer Science from the University of British Columbia in 1974 and his Ph.D. in Computer Science from Stanford University in 1979. He is a Fellow of SIAM (Society for Industrial and Applied Mathematics) and of the IMA (Institute of Mathematics and its Applications, UK). He served on the Council and Board of Trustees of SIAM from 1991 to 2005, including a term as Chair of the Board from 2004 to 2005. He served as Editor-in-Chief of SIAM Journal on Optimization from 1995 to 1999 and of the IMA Journal of Numerical Analysis from 2007 to 2008, and was the Editor-in-Chief of the MPS (Mathematical Programming Society)-SIAM joint book series from 2003 to 2007. He is currently an editor of SIAM Journal on Matrix Analysis and Applications, IMA Journal of Numerical Analysis, Foundations of Computational Mathematics, and Numerische Mathematik. His research interests are at the interface of optimization and linear algebra, especially nonsmooth optimization problems involving eigenvalues, pseudospectra, stability and robust control. He is the author of Numerical Computing with IEEE Floating Point Arithmetic (SIAM, 2001).

MSSE Online Information Session

Have all your questions about the Master of Science in Software Engineering (MSSE) program answered by attending this online information session.

RSVP now to reserve your spot.

Attendees will be sent a link prior to the event.
 

UMN Machine Learning Seminar

The UMN Machine Learning Seminar Series brings together faculty, students, and local industrial partners who are interested in the theoretical, computational, and applied aspects of machine learning, to pose problems, exchange ideas, and foster collaborations. The talks are every Thursday from 12 p.m. - 1 p.m. during the Summer 2021 semester.

This week's speaker, Professor Mahdi Soltanolkotabi (USC) will be giving a talk titled "Overparameterized learning beyond the lazy regime."

Abstract

Modern learning models are typically trained in an over-parameterized regime where the parameters of the model far exceed the size of the training data. Due to over-parameterization these models in principle have the capacity to (over)fit any set of labels including pure noise. Despite this high fitting capacity, somewhat paradoxically, these models trained via first-order methods continue to predict well on yet unseen test data. In this talk I aim to demystify this phenomena in two different problems: (1) The first problem focuses on overparametrization in Generative Adversarial Networks (GANs). A large body of work in supervised learning have shown the importance of model overparameterization in the convergence of the gradient descent (GD) to globally optimal solutions. In contrast, the unsupervised setting and GANs in particular involve non-convex concave mini-max optimization problems that are often trained using Gradient Descent/Ascent (GDA). The role and benefits of model overparameterization in the convergence of GDA to a global saddle point in non-convex concave problems is far less understood. In this part of the talk I will present a comprehensive analysis of the importance of model overparameterization in GANs both theoretically and empirically. (2) The second problem focuses on overparameterized learning in the context of low-rank reconstruction from a few measurements. For this problem I will show that despite the presence of many global optima gradient descent from small random initialization converges to a generalizable solution and finds the underlying low-rank matrix. Notably this analysis is not in the “lazy” training regime and is based on an intriguing phenomena uncovering the critical role of small random initialization: a few iterations of gradient descent behaves akin to popular spectral methods. We also show that this implicit spectral bias from small random initialization, which is provably more prominent for overparameterized models, puts the gradient descent iterations on a particular trajectory towards solutions that are not only globally optimal but also generalize well.

Biography

Mahdi Soltanolkotabi is an associate professor in the Ming Hsieh Department of Electrical and Computer Engineering and Computer Science at the University of Southern California where he holds an Andrew and Erna Viterbi Early Career Chair. Prior to joining USC, he completed his PhD in electrical engineering at Stanford in 2014. He was a postdoctoral researcher in the EECS department at UC Berkeley during the 2014-2015 academic year.

Mahdi is the recipient of the Information Theory Society Best Paper Award, Packard Fellowship in Science and Engineering, a Sloan Research Fellowship in Mathematics, an NSF Career award, an Airforce Office of Research Young Investigator award (AFOSR-YIP), the Viterbi school of engineering junior faculty research award, and a Google faculty research award.

UMN Machine Learning Seminar

The UMN Machine Learning Seminar Series brings together faculty, students, and local industrial partners who are interested in the theoretical, computational, and applied aspects of machine learning, to pose problems, exchange ideas, and foster collaborations. The talks are every Thursday from 12 p.m. - 1 p.m. during the Summer 2021 semester.

This week's speaker, Weijie Su (Wharton Statistics Department, University of Pennsylvania) will be giving a talk titled "Local Elasticity: A Phenomenological Approach Toward Understanding Deep Learning."

Biography

Weijie Su is an Assistant Professor in the Wharton Statistics Department and in the Department of Computer and Information Science, at the University of Pennsylvania. He is a co-director of Penn Research in Machine Learning. Prior to joining Penn, he received his Ph.D. from Stanford University in 2016 and his bachelor’s degree from Peking University in 2011. His research interests span machine learning, optimization, privacy-preserving data analysis, and high-dimensional statistics. He is a recipient of the Stanford Theodore Anderson Dissertation Award in 2016, an NSF CAREER Award in 2019, and an Alfred Sloan Research Fellowship in 2020.

Abstract

Motivated by the iterative nature of training neural networks, we ask: If the weights of a neural network are updated using the induced gradient on an image of a tiger, how does this update impact the prediction of the neural network at another image (say, an image of another tiger, a cat, or a plane)? To address this question, I will introduce a phenomenon termed local elasticity. Roughly speaking, our experiments show that modern deep neural networks are locally elastic in the sense that the change in prediction is likely to be most significant at another tiger and least significant at a plane, at late stages of the training process. I will illustrate some implications of local elasticity by relating it to the neural tangent kernel and improving on the generalization bound for uniform stability. Moreover, I will introduce a phenomenological model for simulating neural networks, which suggests that local elasticity may result from feature sharing between semantically related images and the hierarchical representations of high-level features. Finally, I will offer a local-elasticity-focused agenda for future research toward a theoretical foundation for deep learning.

MSSE Online Information Session

Have all your questions about the Master of Science in Software Engineering (MSSE) program answered by attending this online information session.

RSVP now to reserve your spot.

Attendees will be sent a link prior to the event.