Past events

Graduate Programs Information Session

Prospective students can RSVP for an information session to learn about the following graduate programs:

Computer Science M.S.
Computer Science MCS
Computer Science Ph.D.
Data Science M.S.
Data Science Post-Baccalaureate Certificate

During the information session, we will go over the following:

Requirements (general)
Applying
Prerequisite requirements
What makes a strong applicant
Funding
Resources
Common questions
Questions from attendees

UMN Machine Learning Seminar

The UMN Machine Learning Seminar Series brings together faculty, students, and local industrial partners who are interested in the theoretical, computational, and applied aspects of machine learning, to pose problems, exchange ideas, and foster collaborations. The talks are every Thursday from 12 p.m. - 1 p.m. during the Summer 2021 semester.

This week's speaker, Xiwei Tang (University of Virginia) will be giving a talk titled "Multivariate Temporal Point Process Regression with Applications in Calcium Imaging Analysis."

Abstract

Point process modeling is gaining increasing attention, as point process type data are emerging in a large variety of scientific applications. In this article, motivated by a neuronal spike trains study, we propose a novel point process regression model, where both the response and the predictor can be a high-dimensional point process. We model the predictor effects through the conditional intensities using a set of basis transferring functions in a convolutional fashion. We organize the corresponding transferring coefficients in the form of a three-way tensor, then impose the low-rank, sparsity, and subgroup structures on this coefficient tensor. These structures help reduce the dimensionality, integrate information across different individual processes, and facilitate the interpretation. We develop a highly scalable optimization algorithm for parameter estimation. We derive the large sample error bound for the recovered coefficient tensor, and establish the subgroup identification consistency, while allowing the dimension of the multivariate point process to diverge. We demonstrate the efficacy of our method through both simulations and a cross-area neuronal spike trains analysis in a sensory cortex study.

Biography

Coming soon

UMN Machine Learning Seminar

This week's speaker, Zhaoran Wang (Northwestern University) will be giving a talk titled "Demystifying (Deep) Reinforcement Learning with Optimism and Pessimism."

Abstract

Coupled with powerful function approximators such as deep neural networks, reinforcement learning (RL) achieves tremendous empirical successes. However, its theoretical understandings lag behind. In particular, it remains unclear how to provably attain the optimal policy with a finite regret or sample complexity. In this talk, we will present the two sides of the same coin, which demonstrates an intriguing duality between optimism and pessimism.

– In the online setting, we aim to learn the optimal policy by actively interacting with the environment. To strike a balance between exploration and exploitation, we propose an optimistic least-squares value iteration algorithm, which achieves a \sqrt{T} regret in the presence of linear, kernel, and neural function approximators.

– In the offline setting, we aim to learn the optimal policy based on a dataset collected a priori. Due to a lack of active interactions with the environment, we suffer from the insufficient coverage of the dataset. To maximally exploit the dataset, we propose a pessimistic least-squares value iteration algorithm, which achieves a minimax-optimal sample complexity.

Biography

Zhaoran Wang is an assistant professor at Northwestern University, working at the interface of machine learning, statistics, and optimization. He is the recipient of the AISTATS (Artificial Intelligence and Statistics Conference) notable paper award, Microsoft Ph.D. Fellowship, Simons-Berkeley/J.P. Morgan AI Research Fellowship, Amazon Machine Learning Research Award, and NSF CAREER Award.

Priority deadline for 2021 Grace Hopper Celebration tickets

Every year, the Department of Computer Science & Engineering has a major presence at the Grace Hopper Celebration (GHC), the world’s largest gathering of women technologists.

The 2021 GHC event will be held from September 27 to October 1. We invite current students in our data science programs to request a department funded ticket to attend this year's event.

Interested students should fill out the interest form by Friday, July 9 at noon. Keep in mind that filling out the form does not guarantee a ticket. Also, please note that students who are provided departmental tickets to the virtual Grace Hopper Celebration 2021 will still have the opportunity to request funding for a future in-person Grace Hopper Celebration.

Departmental staff will contact students who filled out the form about their ticket status sometime after July 9, 2021.

Please feel free to reach out to Allison Small if you have any questions.

UMN Machine Learning Seminar

This week's speaker, Rohan Anil (Google Brain) will be giving a talk titled "Scalable Second-Order Optimization for Deep Learning."

Abstract

Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent. Second-order optimization methods, that involve second derivatives and/or second order statistics of the data, are far less prevalent despite strong theoretical properties, due to their prohibitive computation, memory and communication costs. In an attempt to bridge this gap between theoretical and practical optimization, we present a scalable implementation of a second-order preconditioned method (concretely, a variant of full-matrix Adagrad), that along with several critical algorithmic and numerical improvements, provides significant convergence and wall-clock time improvements compared to conventional first-order methods on state-of-the-art deep models. Our novel design effectively utilizes the prevalent heterogeneous hardware architecture for training deep models, consisting of a multicore CPU coupled with multiple accelerator units. We demonstrate superior performance compared to state-of-the-art on very large learning tasks such as machine translation with Transformers, language modeling with BERT, click-through rate prediction on Criteo, and image classification on ImageNet with ResNet-50.

Biography

Rohan Anil is a Senior Staff Software Engineer, Google Research, Brain Team. Lately, he has been working on scalable and practical optimization techniques for efficient training of neural networks in various regimes.

University closed

The University of Minnesota will be closed in observance of Independence Day.

View the full schedule of University holidays.

UMN Machine Learning Seminar

This week's speaker, Brian Kulis (Boston University) will be giving a talk titled "New Directions in Metric Learning."

Abstract

Metric learning is a supervised machine learning problem concerned with learning a task-specific distance function from supervised data. It has found numerous applications in problems such as similarity search, clustering, and ranking. Much of the foundational work in this area focused on the class of so-called Mahalanobis metrics, which may be viewed as Euclidean distances after linear transformations of the data. This talk will describe two recent directions in metric learning: deep metric learning and divergence learning. The first replaces the linear transformations with the output of a neural network, while the second considers a broader class than Mahalanobis metrics. I will discuss some of my recent work along both of these fronts, as well as ongoing attempts to combine these approaches together using a novel framework called deep divergences.

Biography

Brian Kulis is an associate professor at Boston University, with appointments in the Department of Electrical and Computer Engineering, the Department of Computer Science, the Faculty of Computing and Data Sciences, and the Division of Systems Engineering. He also is an Amazon Scholar, working with the Alexa team. Previously he was the Peter J. Levine Career Development assistant professor at Boston University. Before joining Boston University, he was an assistant professor in Computer Science and in Statistics at Ohio State University, and prior to that was a postdoctoral fellow at UC Berkeley EECS. His research focuses on machine learning, statistics, computer vision, and large-scale optimization. He obtained his PhD in computer science from the University of Texas in 2008, and his BA degree from Cornell University in computer science and mathematics in 2003. For his research, he has won three best paper awards at top-tier conferences---two at the International Conference on Machine Learning (in 2005 and 2007) and one at the IEEE Conference on Computer Vision and Pattern Recognition (in 2008). He is also the recipient of an NSF CAREER Award in 2015, an MCD graduate fellowship from the University of Texas (2003-2007), and an Award of Excellence from the College of Natural Sciences at the University of Texas.

UMN Machine Learning Seminar

This week's speaker, Michael Overton (Courant Institute of Mathematical Sciences, NYU) will be giving a talk titled "Nonsmooth, Nonconvex Optimization: Algorithms and Examples."

Abstract

In many applications one wishes to minimize an objective function that is not convex and is not differentiable at its minimizes. We discuss two algorithms for minimization of general nonsmooth, nonconvex functions. Gradient Sampling is a simple method that, although computationally intensive, has a nice convergence theory. The method is robust and the convergence theory has been extended to constrained problems. BFGS is a well known method, developed for smooth problems, but which is remarkably effective for nonsmooth problems too. Although our theoretical results in the nonsmooth case are quite limited, we have made extensive empirical observations and have had broad success with BFGS in nonsmooth applications. Limited Memory BFGS is a popular extension for large-scale problems, but we show that, in contrast to BFGS, it sometimes converges to non-optimal nonsmooth points. Throughout the talk we illustrate the ideas through examples, some very easy and some very challenging.

Biography

Michael L. Overton is Silver Professor of Computer Science and Mathematics at the Courant Institute of Mathematical Sciences, New York University. He received his B.Sc. in Computer Science from the University of British Columbia in 1974 and his Ph.D. in Computer Science from Stanford University in 1979. He is a Fellow of SIAM (Society for Industrial and Applied Mathematics) and of the IMA (Institute of Mathematics and its Applications, UK). He served on the Council and Board of Trustees of SIAM from 1991 to 2005, including a term as Chair of the Board from 2004 to 2005. He served as Editor-in-Chief of SIAM Journal on Optimization from 1995 to 1999 and of the IMA Journal of Numerical Analysis from 2007 to 2008, and was the Editor-in-Chief of the MPS (Mathematical Programming Society)-SIAM joint book series from 2003 to 2007. He is currently an editor of SIAM Journal on Matrix Analysis and Applications, IMA Journal of Numerical Analysis, Foundations of Computational Mathematics, and Numerische Mathematik. His research interests are at the interface of optimization and linear algebra, especially nonsmooth optimization problems involving eigenvalues, pseudospectra, stability and robust control. He is the author of Numerical Computing with IEEE Floating Point Arithmetic (SIAM, 2001).

UMN Machine Learning Seminar

This week's speaker, Professor Mahdi Soltanolkotabi (USC) will be giving a talk titled "Overparameterized learning beyond the lazy regime."

Abstract

Modern learning models are typically trained in an over-parameterized regime where the parameters of the model far exceed the size of the training data. Due to over-parameterization these models in principle have the capacity to (over)fit any set of labels including pure noise. Despite this high fitting capacity, somewhat paradoxically, these models trained via first-order methods continue to predict well on yet unseen test data. In this talk I aim to demystify this phenomena in two different problems: (1) The first problem focuses on overparametrization in Generative Adversarial Networks (GANs). A large body of work in supervised learning have shown the importance of model overparameterization in the convergence of the gradient descent (GD) to globally optimal solutions. In contrast, the unsupervised setting and GANs in particular involve non-convex concave mini-max optimization problems that are often trained using Gradient Descent/Ascent (GDA). The role and benefits of model overparameterization in the convergence of GDA to a global saddle point in non-convex concave problems is far less understood. In this part of the talk I will present a comprehensive analysis of the importance of model overparameterization in GANs both theoretically and empirically. (2) The second problem focuses on overparameterized learning in the context of low-rank reconstruction from a few measurements. For this problem I will show that despite the presence of many global optima gradient descent from small random initialization converges to a generalizable solution and finds the underlying low-rank matrix. Notably this analysis is not in the “lazy” training regime and is based on an intriguing phenomena uncovering the critical role of small random initialization: a few iterations of gradient descent behaves akin to popular spectral methods. We also show that this implicit spectral bias from small random initialization, which is provably more prominent for overparameterized models, puts the gradient descent iterations on a particular trajectory towards solutions that are not only globally optimal but also generalize well.

Biography

Mahdi Soltanolkotabi is an associate professor in the Ming Hsieh Department of Electrical and Computer Engineering and Computer Science at the University of Southern California where he holds an Andrew and Erna Viterbi Early Career Chair. Prior to joining USC, he completed his PhD in electrical engineering at Stanford in 2014. He was a postdoctoral researcher in the EECS department at UC Berkeley during the 2014-2015 academic year.

Mahdi is the recipient of the Information Theory Society Best Paper Award, Packard Fellowship in Science and Engineering, a Sloan Research Fellowship in Mathematics, an NSF Career award, an Airforce Office of Research Young Investigator award (AFOSR-YIP), the Viterbi school of engineering junior faculty research award, and a Google faculty research award.

UMN Machine Learning Seminar

This week's speaker, Weijie Su (Wharton Statistics Department, University of Pennsylvania) will be giving a talk titled "Local Elasticity: A Phenomenological Approach Toward Understanding Deep Learning."

Biography

Weijie Su is an Assistant Professor in the Wharton Statistics Department and in the Department of Computer and Information Science, at the University of Pennsylvania. He is a co-director of Penn Research in Machine Learning. Prior to joining Penn, he received his Ph.D. from Stanford University in 2016 and his bachelor’s degree from Peking University in 2011. His research interests span machine learning, optimization, privacy-preserving data analysis, and high-dimensional statistics. He is a recipient of the Stanford Theodore Anderson Dissertation Award in 2016, an NSF CAREER Award in 2019, and an Alfred Sloan Research Fellowship in 2020.

Abstract

Motivated by the iterative nature of training neural networks, we ask: If the weights of a neural network are updated using the induced gradient on an image of a tiger, how does this update impact the prediction of the neural network at another image (say, an image of another tiger, a cat, or a plane)? To address this question, I will introduce a phenomenon termed local elasticity. Roughly speaking, our experiments show that modern deep neural networks are locally elastic in the sense that the change in prediction is likely to be most significant at another tiger and least significant at a plane, at late stages of the training process. I will illustrate some implications of local elasticity by relating it to the neural tangent kernel and improving on the generalization bound for uniform stability. Moreover, I will introduce a phenomenological model for simulating neural networks, which suggests that local elasticity may result from feature sharing between semantically related images and the hierarchical representations of high-level features. Finally, I will offer a local-elasticity-focused agenda for future research toward a theoretical foundation for deep learning.

Main Office	Student Services
4-192 Keller Hall 200 Union Street SE Minneapolis, MN 55455 (612) 625-4002 csdesk@umn.edu	324 Lind Hall 207 Church Street SE Minneapolis, MN 55455 (612) 625-4002 csdesk@umn.edu