Past Events

Data Science @ Meta

Zeinab Takbiri (Facebook)

Abstract has been removed at the request of the speaker.

Integrative Discriminant Analysis Methods for Multi-view Data

Sandra Safo (University of Minnesota, Twin Cities)

Many diseases are complex heterogeneous conditions that affect multiple organs in the body and depend on the interplay between several factors that include molecular and environmental factors, thus requiring a holistic approach in understanding the complexity and heterogeneity.  In this talk, I will present some of our current statistical and machine learning methods for integrating data from multiple sources while simultaneously classifying units or individuals into one of multiple classes or disease groups. The proposed methods are tested using both simulated data and real-world datasets, including RNA sequencing, metabolomics, and proteomics data pertaining to COVID-19 severity. We identified signatures that better discriminated COVID-19 patient groups, and related to neurological conditions, cancer, and metabolic diseases, corroborating current research findings and heightening the need to study the post sequelae effects of COVID-19 to devise effective treatments and to improve patient care.

Sandra Safo is an Assistant Professor of Biostatistics at the University of Minnesota. She is interested in developing statistical learning, data integration, and feature selection methods for high-dimensional data. Currently, she develops methods for integrative analysis of “omics” (including genomics, transcriptomics, and metabolomics) and clinical data to help elucidate the complex interactions of these multifaceted data types.


Towards a Better Evaluation of Football Players

Eric Eager (ProFootballFocus (PFF))

The game of football is undergoing a significant shift towards the quantitative. Much of the progress made in the analytics space can be attributed to play-by-play data and charting data.  However, recent years have given rise to tracking data, which has opened the door for innovation that was not possible before. In this talk I will describe how to gain an edge in player evaluation by building off of traditional charting data with state-of-the-art player tracking data, and foreshadow how such methods will revolutionize the sport of football in the future.

Eric Eager is the head of research, development and innovation at PFF, a worldwide leader in sports data and analytics. Prior to joining PFF, Eric earned a PhD in mathematical biology from the University of Nebraska, publishing 25 papers in applied mathematics, mathematical biology, ecology and the scholarship of teaching and learning. Eric is a native of Maplewood, MN.

Graph Clustering Dynamics: From Spectral to Mean Shift

Katy Craig (University of California, Santa Barbara)

Clustering algorithms based on mean shift or spectral methods on graphs are ubiquitous in data analysis. However, in practice, these two types of algorithms are treated as conceptually disjoint: mean shift clusters based on the density of a dataset, while spectral methods allow for clustering based on geometry. In joint work with Nicolás García Trillos and Dejan Slepčev, we define a new notion of Fokker-Planck equation on graph and use this to introduce an algorithm that interpolates between mean shift and spectral approaches, enabling it to cluster based on both the density and geometry of a dataset. We illustrate the benefits of this approach in numerical examples and contrast it with Coifman and Lafon’s well-known method of diffusion maps, which can also be thought of as a Fokker-Planck equation on a graph, though one that degenerates in the zero diffusion limit.

Katy Craig is an assistant professor at the University of California, Santa Barbara, specializing in partial differential equations and optimal transport. She received her PhD from Rutgers University in 2014, after which she spent one year at UCLA as an NSF Mathematical Sciences Postdoctoral Fellow and one year at UCSB as an UC President’s Postdoctoral Fellow.

Best Practices A Data Scientist Should Know

Hande Tuzel (Sabre Corporation)


In this talk, Hande will give an overview of some of the best practices a data scientist should know. These will include topics like virtual environments, utilizing functions, code documentation and other things that you could start incorporating in your data science projects or coding in general. She will also include some quick tips and advice on how to prepare for a Data Scientist job interview. Hopefully, these will help you prepare for a successful career in industry.

Hande received her PhD in Applied Mathematics from University of Minnesota in 2009, under the supervision of Fadil Santosa. Her dissertation was on improvement of mask design in integrated circuit printing technologies using level set methods. After a decade of experience in academia training future scientists and engineers, she decided to transition to industry. She is now a self-taught Data Scientist currently working at Sabre Labs Research. If she is not busy coding or reading a paper, you can find her hiking, crocheting hats or practicing inversions as a yogi.

Decomposing Low-Rank Symmetric Tensors

Joe Kileel (The University of Texas at Austin)

In this talk, I will discuss low-rank decompositions of symmetric tensors (a.k.a. higher-order symmetric matrices).  I will start by sketching how results in algebraic geometry imply uniqueness guarantees for tensor decompositions, and also lead to fast and numerically stable algorithms for calculating the decompositions.  Then I will quantify the associated non-convex optimization landscapes.  Finally, I will present applications to Gaussian mixture models in data science, and rigid motion segmentation in computer vision.  Based on joint works with João M. Pereira, Timo Klock and Tammy Kolda.

Data-Model Fusion to Predict the Impacts of Climate Change on Mosquito-borne Diseases

Carrie Manore (Los Alamos National Laboratory)

Mosquito-borne diseases are among the many human-natural systems that will be impacted by climate change. All of the life stages and development rates of mosquitoes are impacted by temperature and other environmental factors, and often human infrastructure provides habitat  (irrigation, containers, water management, etc). This poses a very interesting mathematical modeling problem: how do we account for relevant factors, capture the nonlinearities, and understand the uncertainty in our models and in the data used to calibrate and validate the models? I will present several models, ranging from continental to fine scale and from statistical and machine learning to mechanistic, that we are using to predict mosquito-borne diseases and how they will be impacted by climate change. Over 30 people have worked together on this project, including students, postdocs, and staff. Our team is interdisciplinary and tasked with addressing critical national security problems around human health and climate change.

Stability and Generalization in Graph Convolutional Neural Networks

Ron Levie (Ludwig-Maximilians-Universität München)

In recent years, the need to accommodate non-Euclidean structures in data science has brought a boom in deep learning methods on graphs, leading to many practical applications with commercial impact. In this talk, we will review the mathematical foundations of the generalization capabilities of graph convolutional neural networks (GNNs). We will focus mainly on spectral GNNs, where convolution is defined as element-wise multiplication in the frequency domain of the graph. 

In machine learning settings where the dataset consists of signals defined on many different graphs, the trained GNN should generalize to graphs outside the training set. A GNN is called transferable if, whenever two graphs represent the same underlying phenomenon, the GNN has similar repercussions on both graphs. Transferability ensures that GNNs generalize if the graphs in the test set represent the same phenomena as the graphs in the training set. We will discuss the different approaches to mathematically model the notions of transferability, and derive corresponding transferability error bounds, proving that GNNs have good generalization capabilities.

Ron Levie received the Ph.D. degree in applied mathematics in 2018, from Tel Aviv University, Israel. During 2018-2020, he was a postdoctoral researcher with the Research Group Applied Functional Analysis, Institute of Mathematics, TU Berlin, Germany. Since 2021 he is a researcher in the Bavarian AI Chair for Mathematical Foundations of Artificial Intelligence, Department of Mathematics, LMU Munich, Germany. Since 2021, he is also a consultant at the project Radio-Map Assisted Pathloss Prediction, at the Communications and Information Theory Chair, TU Berlin. He won excellence awards for his MSc and PhD studies, and a Post-Doc Minerva Fellowship. He is a guest editor at Sampling Theory, Signal Processing, and Data Analysis (SaSiDa), and was a conference chair of the Online International Conference on Computational Harmonic Analysis (Online-ICCHA 2021).

His current research interests are in theory of deep learning, geometric deep learning, interpretability of deep learning, deep learning in wireless communication, harmonic analysis, signal processing, wavelet theory, uncertainty principles, continuous frames, and randomized methods.


Pointers on AI/ML Career Success

Paritosh Desai (Google Inc.)

While there are many commonalities between academic research and roles in the industry for applied math professionals, there are also important differences. These differences are material in shaping career outcomes in the industry and we try to elaborate on them by focusing on two broad themes for people with academic research backgrounds. First, we will look at the common patterns related to applied AI/ML problems across multiple industries and specific challenges around them. Second, we will discuss emergent requirements for success in the industry setting. We will share principles and anecdotes related to data, software engineering practices, and empirical research based upon industry experiences.

Intelligent Randomized Algorithms for the Low CP-Rank Tensor Approximation Problem

Alex Gittens (Rensselaer Polytechnic Institute)

In the context of numerical linear algebra algorithms, where it is natural to sacrifice accuracy in return for quicker computation of solutions whose errors are only slightly larger than optimal, the time-accuracy tradeoff of randomized sketching has been well-characterized. Algorithms such as Blendenpik and LSRN have shown that carefully designed randomized algorithms can outperform industry standard linear algebra codes such as those provided in LAPACK.

For numerical tensor algorithms, where the size of problems grow exponentially with the order of the tensor, it is even more desirable to use randomization. However, in this setting, the time-accuracy tradeoff of randomized sketching is more difficult to understand and exploit, as:

  1. in the first place, tensor problems are non-convex, 
  2. the properties of the data change from iteration to iteration, and
  3. straightforward applications of standard results on randomized sketching allow for the error to increase from iteration to iteration.

On the other hand, the iterative nature of such algorithms opens up the opportunity to learn how to sketch more accurately in an online manner.

In this talk we consider the problem of speeding up the computation of low CP-rank (canonical polyadic) approximations of tensors through regularized sketching. We establish for the first time a sublinear convergence rate to approximate critical points of the objective under standard conditions, and further provide algorithms that adaptively select the sketching and regularization rates.

Alex Gittens is an assistant professor of computer science at Rensselaer Polytechnic Institute. He obtained his PhD in applied mathematics from CalTech in 2013, and BSes in mathematics and electrical engineering from the University of Houston. After his PhD, he joined the eBay machine learning research group, then the AMPLab (now the RISELab) at UC Berkeley, before joining RPI. His research interests lie at the intersection of randomized linear algebra and large-scale machine learning, in particular encompassing nonlinear and multilinear low-rank approximations; sketching for nonlinear and multilinear problems; and scalable and data-dependent kernel learning.