Past events

IMA Data Science Seminar

The Institute for Mathematics and Its Applications (IMA) Data Science Seminars are a forum for data scientists of IMA academic and industrial partners to discuss and learn about recent developments in the broad area of data science. The seminars take place on Tuesdays from 1:25 p.m. - 2:25 p.m.

This week's speaker is Danny Abrams (Northwestern University).

Industrial Problems Seminar: Being Smart and Dumb - Building the Sports Analytics Industry

In collaboration with the Minnesota Center for Industrial Mathematics, the Industrial Problems Seminars are a forum for industrial researchers to offer a first-hand glimpse into industrial research. The seminars take place Fridays from 1:25 p.m. - 2:25 p.m.

This week's speaker, Dean Oliver ( NBA's Washington Wizards), will be giving a talk titled "Being Smart and Dumb: Building the Sports Analytics Industry."

Registration is required to access the Zoom webinar.

Abstract

Going from a scientific background into something that people haven't done comes with moments where you don't know what you're talking about... if you talk, that is. Admitting the times you don't know how your work can help and introducing your work when it may be able to help - that timing can be hard. I went from the field I was trained in- environmental engineering and consulting - to a job with no title at first. I had to write a book about how stats can help in basketball. Someone else invented the term "Sports Analytics". This talk is a little bit of that story.

Biography

Lawrence Dean Oliver is an American statistician and assistant coach for the NBA's Washington Wizards. Oliver is a prominent contributor to the advanced statistical evaluation of basketball. He is the author of Basketball on Paper, the former producer of the defunct Journal of Basketball Studies.

UMN Machine Learning Seminar

The UMN Machine Learning Seminar Series brings together faculty, students, and local industrial partners who are interested in the theoretical, computational, and applied aspects of machine learning, to pose problems, exchange ideas, and foster collaborations. The talks are every Thursday from 12 p.m. - 1 p.m. during the Fall 2021 semester.

This week's speaker is Tuo Zhao (Georgia Tech).

Abstract

Transfer learning has fundamentally changed the landscape of natural language processing (NLP). Many state-of-the-art models are first pre-trained on a large text corpus and then fine-tuned on downstream tasks. When we only have limited supervision for the downstream tasks, however, due to the extremely high complexity of pre-trained models, aggressive fine-tuning often causes the fine-tuned model to overfit the training data of downstream tasks and fail to generalize to unseen data.

To address such a concern, we propose a new approach for fine-tuning of pretrained models to attain better generalization performance. Our proposed approach adopts three important ingredients: (1) Smoothness-inducing adversarial regularization, which effectively controls the complexity of the massive model; (2) Bregman proximal point optimization, which is an instance of trust-region algorithms and can prevent aggressive updating; (3) Differentiable programming, which can mitigate the undesired bias induced by conventional adversarial training algorithms. Our experiments show that the proposed approach significantly outperforms existing methods in multiple NLP tasks. In addition, our theoretical analysis provides some new insights of adversarial training for improving generalization.

Biography

Tuo Zhao is an assistant professor at Georgia Tech. He received his Ph.D. degree in Computer Science at Johns Hopkins University. His research mainly focuses on developing methodologies, algorithms and theories for machine learning, especially deep learning. He is also actively working in neural language models and open-source machine learning software for scientific data analysis. He has received several awards, including the winner of INDI ADHD-200 global competition, ASA best student paper award on statistical computing, INFORMS best paper award on data mining and Google faculty research award.

First day of classes

Welcome back! The fall 2021 semester begins on Tuesday, September 7.

View the full academic schedule on One Stop.
 

University closed

The University of Minnesota will be closed in observance of Labor Day.

View the full schedule of University holidays.
 

UMN Machine Learning Seminar

The UMN Machine Learning Seminar Series brings together faculty, students, and local industrial partners who are interested in the theoretical, computational, and applied aspects of machine learning, to pose problems, exchange ideas, and foster collaborations. The talks are every Thursday from 12 p.m. - 1 p.m. during the Summer 2021 semester.

This week's speaker, Simon Batzner (Harvard University) will be giving a talk titled "Causal Inference from Slowly Varying Nonstationary Processes."

Abstract

Representations of atomistic systems for machine learning must transform predictably under the geometric transformations of 3D space, in particular rotation, translation, mirrors, and permutation of atoms of the same species. These constraints are typically satisfied by means of atomistic representations that depend on scalar distances and angles, leaving the representation invariant under the above transformations. Invariance, however, limits the expressivity and can lead to an incompleteness of representations. In order to overcome this shortcoming, we recently introduced Neural Equviariant Interatomic Potentials [1], a Graph Neural Network approach for learning interatomic potentials that uses a E(3)-equivariant representation of atomic environments. While most current Graph Neural Network interatomic potentials use invariant convolutions over scalar features, NequIP instead employs equivariant convolutions over geometric tensors (scalar, vectors, …), providing a more information-rich message passing scheme. In my talk, I will first motivate the choice of an equivariant representation for atomistic systems and demonstrate how it allows for the design of interatomic potentials at previously unattainable accuracy. I will discuss applications on a diverse set of molecules and materials, including small organic molecules, water in different phases, a catalytic surface reaction, proteins, glass formation of a lithium phosphate, and Li diffusion in a superionic conductor. I will then show that NequIP can predict structural and kinetic properties from molecular dynamics simulations in excellent agreement with ab-initio simulations. The talk will then discuss the observation of a remarkable sample efficiency in equivariant interatomic potentials which outperform existing neural network potentials with up to 1000x fewer training data and rival or even surpass the sample efficiency of kernel methods. Finally, I will discuss potential reasons for the high sample efficiency of equivariant interatomic potentials.

Biography

Batzner is a mathematician and machine learning researcher at Harvard. Previously, he worked on machine learning at MIT, wrote software on a NASA mission, and spent some time at McKinsey. He enjoys working with ambitious people who want to change the world.

UMN Machine Learning Seminar

The UMN Machine Learning Seminar Series brings together faculty, students, and local industrial partners who are interested in the theoretical, computational, and applied aspects of machine learning, to pose problems, exchange ideas, and foster collaborations. The talks are every Thursday from 12 p.m. - 1 p.m. during the Summer 2021 semester.

This week's speaker, Yu Xiang (University of Utah) will be giving a talk.

Abstract

Diffusion source identification on networks is a problem of fundamental importance in a broad class of applications, including rumor controlling and virus identification. Though this problem has received significant recent attention, most studies have focused only on very restrictive settings and lack theoretical guarantees for more realistic networks. We introduce a statistical framework for the study of diffusion source identification and develop a confidence set inference approach inspired by hypothesis testing. Our method efficiently produces a small subset of nodes, which provably covers the source node with any pre-specified confidence level without restrictive assumptions on network structures. Moreover, we propose multiple Monte Carlo strategies for the inference procedure based on network topology and the probabilistic properties that significantly improve the scalability. To our knowledge, this is the first diffusion source identification method with a practically useful theoretical guarantee on general networks. We demonstrate our approach via extensive synthetic experiments on well-known random network models and a mobility network between cities concerning the COVID-19 spreading. This is joint work with Quilan Dawkins and Haifeng Xu at UVA.

Biography

Yu Xiang is an Assistant Professor in Electrical and Computer Engineering at the University of Utah since July 2018. Prior to this, he was a postdoctoral fellow in Harvard John A. Paulson School of Engineering and Applied Sciences at Harvard University. He obtained his Ph.D. in Electrical and Computer Engineering from the University of California, San Diego in 2015. I received my B.E. with the highest distinction from the School of Telecommunications Engineering at Xidian University, Xi'an, China, in 2008. His current research interests include statistical signal processing, information theory, machine learning, and their applications to neuroscience and computational biology.

2nd annual (virtual) workshop on Knowledge Guided Machine Learning

We are excited to announce the 2nd annual workshop on Knowledge Guided Machine Learning (KGML2021).

This virtual workshop will be held August 9-11, 2021, with presentations via Zoom and YouTube (links will be provided just prior to the workshop start date). KGML2021 is part of a project funded by an award from the National Science Foundation's Harnessing the Data Revolution (HDR) Big Idea, and is free and open to anyone to attend.

The workshop will include invited talks by leading experts and contributed poster sessions. The workshop will bring together data scientists (researchers in data mining, machine learning, and statistics) and researchers from hydrology, atmospheric science, aquatic sciences, and translational biology to discuss challenges, opportunities, and early progress in designing a new generation of machine learning methods that are guided by scientific knowledge.

Register here. Space permitting, registration will remain open until August 8.

The previous workshop (held August 18-20, 2020) attracted over 1,000 attendees from over 30 countries.

UMN Machine Learning Seminar

The UMN Machine Learning Seminar Series brings together faculty, students, and local industrial partners who are interested in the theoretical, computational, and applied aspects of machine learning, to pose problems, exchange ideas, and foster collaborations.

This week's speaker, Tianxi Li (University of Virginia) will be giving a talk titled "Diffusion Source Identification on Networks with Statistical Confidence." Please note that this week's seminar will be held from 12:30 p.m. - 1:30 p.m.

Abstract

Diffusion source identification on networks is a problem of fundamental importance in a broad class of applications, including rumor controlling and virus identification. Though this problem has received significant recent attention, most studies have focused only on very restrictive settings and lack theoretical guarantees for more realistic networks. We introduce a statistical framework for the study of diffusion source identification and develop a confidence set inference approach inspired by hypothesis testing. Our method efficiently produces a small subset of nodes, which provably covers the source node with any pre-specified confidence level without restrictive assumptions on network structures. Moreover, we propose multiple Monte Carlo strategies for the inference procedure based on network topology and the probabilistic properties that significantly improve the scalability. To our knowledge, this is the first diffusion source identification method with a practically useful theoretical guarantee on general networks. We demonstrate our approach via extensive synthetic experiments on well-known random network models and a mobility network between cities concerning the COVID-19 spreading. This is joint work with Quilan Dawkins and Haifeng Xu at UVA.

Biography

Tianxi Li is currently an assistant professor in the Department of Statistics at the University of Virginia. He obtained his Ph.D. from the University of Michigan in 2018. His research is mainly about statistical machine learning and statistical network analysis.

UMN Machine Learning Seminar

The UMN Machine Learning Seminar Series brings together faculty, students, and local industrial partners who are interested in the theoretical, computational, and applied aspects of machine learning, to pose problems, exchange ideas, and foster collaborations. The talks are every Thursday from 12 p.m. - 1 p.m. during the Summer 2021 semester.

This week's speaker, Chiyuan Zhang (Google Brain) will be giving a talk titled "Characterizing Structural Regularities of Labeled Data in Overparameterized Models."

Abstract

Humans are accustomed to environments that contain both regularities and exceptions. For example, at most gas stations, one pays prior to pumping, but the occasional rural station does not accept payment in advance. Likewise, deep neural networks can generalize across instances that share common patterns or structures, yet have the capacity to memorize rare or irregular forms. We analyze how individual instances are treated by a model via a consistency score. The score characterizes the expected accuracy for a held-out instance given training sets of varying size sampled from the data distribution. We obtain empirical estimates of this score for individual instances in multiple data sets, and we show that the score identifies out-of-distribution and mislabeled examples at one end of the continuum and strongly regular examples at the other end. We identify computationally inexpensive proxies to the consistency score using statistics collected during training. We show examples of potential applications to the analysis of deep-learning systems.

Biography

Chiyuan Zhang is a research scientist at Google Research, Brain Team. He is interested in analyzing and understanding the foundations behind the effectiveness of deep learning, as well as its connection to the cognition and learning mechanisms of the human brain. Chiyuan Zhang holds a Ph.D. from MIT (2017, advised by Tomaso Poggio), and a Bachelor (2009) and a Master (2012) degrees in computer science from Zhejiang University, China. His work was recognized by INTERSPEECH best student paper award in 2014, and ICLR best paper award in 2017.