Past Events

Research and Opportunities in the Mathematical Sciences at Oak Ridge National Laboratory

Juan Restrepo (Oregon State University)

I will present a general overview of Oak Ridge National Laboratory research in mathematics and computing. A brief description of my own initiatives and research will be covered as well. I will also describe opportunities for students, postdocs, and professional mathematicians.

Dr. Juan M. Restrepo is a Distinguished Member of the R&D Staff at Oak Ridge National Laboratory. Restrepo is a fellow of SIAM and APS. He holds professorships at U. Tennessee and Oregon State University. Prior to ORNL, he was a professor of mathematics at Oregon State University and at the University of Arizona. He has been a frequent IMA visitor.

His research focuses on data-driven methods for dynamics, statistical mechanics, transport in ocean and uncertainty quantification in climate science.

Scalable and Sample-Efficient Active Learning for Graph-Based Classification

Kevin Miller (University of California, Los Angeles)

Active learning in semi-supervised classification involves introducing additional labels for unlabelled data to improve the accuracy of the underlying classifier. A challenge is to identify which points to label to best improve performance while limiting the number of new labels; this is often reflected in a tradeoff between exploration and exploitation, similar to the reinforcement learning paradigm. I will talk about my recent work designing scalable and sample-efficient active learning methods for graph-based semi-supervised classifiers that naturally balance this exploration versus exploitation tradeoff. While most work in this field today focuses on active learning for fine-tuning neural networks, I will focus on the low-label rate case where deep learning methods are generally insufficient for producing meaningful classifiers.

Kevin Miller is a rising 5th year Ph.D. candidate in Applied Mathematics at the University of California, Los Angeles (UCLA), studying graph-based machine learning methods with Dr. Andrea Bertozzi. He is currently supported by the DOD’s National Defense Science and Engineering Graduate (NDSEG) Fellowship and was previously supported by the National Science Foundation's NRT MENTOR Fellowship. His undergraduate degree was in Applied and Computational Mathematics from Brigham Young University, Provo. His research focuses on active learning and uncertainty quantification in graph-based semi-supervised classification.

Long-term Time Series Forecasting and Data Generated by Complex Systems

Kaisa Taipale (CH Robinson)

Data science, machine learning, and artificial intelligence are all practices implemented by humans in the context of a complex and ever-changing world. This talk will focus on the challenges of long-term, seasonal, multicyclic time series forecasting in logistics. I will discuss algorithms and implementations including STL, TBATS, and Prophet, with additional attention to the data-generating processes in trucking and the US economy and the importance in algorithm selection of understanding these data-generating processes. Subject matter expertise must always inform mathematical exploration in industry and indeed leads to asking much more interesting mathematical questions.

Standardizing the Spectra of Count Data Matrices by Diagonal Scaling

Boris Landa (Yale University)

A longstanding question when applying PCA is how to choose the number of principal components. Random matrix theory provides useful insights into this question by assuming a “signal+noise” model, where the goal is to estimate the rank of the underlying signal matrix. If the noise is homoskedastic, i.e. the noise variances are identical across all entries, the spectrum of the noise admits the celebrated Marchenko-Pastur (MP) law, providing a simple method for rank estimation. However, in many practical situations, such as in single-cell RNA sequencing (scRNA-seq), the noise is far from being homoskedastic. In this talk, focusing on a Poisson data model, I will present a simple procedure termed biwhitening, which enforces the MP law to hold by appropriately scaling the rows and columns of the data matrix. Aside from the Poisson distribution, this procedure is extended to families of distributions with a quadratic variance function. I will demonstrate this approach on both simulated and experimental data, showcasing accurate rank estimation in simulations and excellent fits to the MP law for real scRNA-seq datasets.

Boris Landa is a Gibbs Assistant Professor in the program for applied mathematics at Yale University. Previously, he completed his Ph.D. in applied mathematics at Tel Aviv University under the guidance of Prof. Yoel Shkolnisky. Boris's research is focused on theory and methods for processing large datasets corrupted by noise and deformations, with applications in the biological sciences.

Handling model uncertainties via informative Goodness-of-Fit

Sara Algeri (University of Minnesota, Twin Cities)

When searching for signals of new astrophysical phenomena, astrophysicists have to account for several sources of non-random uncertainties which can dramatically compromise the sensitivity of the experiment under study. Among these, model uncertainty arising from background mismodeling is particularly dangerous and can easily lead to highly misleading results. Specifically, overestimating the background distribution in the signal region increases the chances of falsely rejecting the hypothesis that the new source is present. Conversely, underestimating the background outside the signal region leads to an artificially enhanced sensitivity and a higher likelihood of claiming a false discovery. The aim of this work is to provide a self-contained framework to perform modeling, estimation, and inference under background mismodeling. The method proposed allows incorporating the (partial) scientific knowledge available on the background distribution, and provides a data-updated version of it in a purely nonparametric fashion, and thus, without requiring the specification of prior distributions. If a calibration (or control regions) is available, the solution discussed does not require the specification of a model for the signal, however when available, it allows to further improve the accuracy of the analysis and to detect additional and unexpected signal sources.

I have been an Assistant Professor in the School of Statistics at the University of Minnesota since August 2018. My appointment at UMN started soon after completing my doctoral studies in statistics at Imperial College London (UK). My research interests mainly lie in astrostatistics, computational statistics, and statistical inference. The main purpose of my work is to provide generalizable statistical solutions which directly address fundamental scientific questions, and can at the same time be easily applied to any other scientific problem following a similar statistical paradigm. In line with this, motivated by the problem of the detection of particle dark matter, my current research focuses on statistical inference for signal detection under lack of regularity. I am also interested in uncertainty quantification in the context of astrophysical discoveries.

SIAM Internship Panel

Montie Avery (University of Minnesota, Twin Cities)

Come learn about the process of finding, interviewing, and getting jobs in industry! Panelists Brendan Cook, Jacob Hegna, Drisana Mosaphir, Cole Wyeth, and Amber Yuan will be here to answer all your questions about finding and participating in internships both before and during the pandemic.

PDE-inspired Methods for Graph-based Semi-supervised Learning

Jeff Calder (University of Minnesota, Twin Cities)

This talk will be an introduction to some recent research on PDE-inspired methods for graph-based learning, specifically for problems with very few labeled training examples. We'll discuss various models, including Laplace, p-Laplacian, re-weighted Laplacians, and Poisson learning, to highlight how connections between graph-PDEs and continuous PDEs can be used for analysis and development of new algorithms. The talk will be at an introductory level, suitable for graduate students.

Being Smart and Dumb: Building the Sports Analytics Industry

Dean Oliver ( NBA's Washington Wizards)

Going from a scientific background into something that people haven't done comes with moments where you don't know what you're talking about... if you talk, that is. Admitting the times you don't know how your work can help and introducing your work when it may be able to help - that timing can be hard. I went from the field I was trained in - environmental engineering and consulting - to a job with no title at first. I had to write a book about how stats can help in basketball. Someone else invented the term "Sports Analytics". This talk is a little bit of that story.

Math-to-Industry Boot Camp VI

Advisory: Application deadline is March 7, 2021

2021 Summer Boot Camp poster

Organizers:

Thomas Hoft, University of St. Thomas
Daniel Spirn, University of Minnesota, Twin Cities

The Math-to-Industry Boot Camp is an intense six-week session designed to provide graduate students with training and experience that is valuable for employment outside of academia. The program is targeted at Ph.D. students in pure and applied mathematics. The boot camp consists of courses in the basics of programming, data analysis, and mathematical modeling. Students work in teams on projects and are provided with training in resume and interview preparation as well as teamwork.

There are two group projects during the session: a small-scale project designed to introduce the concept of solving open-ended problems and working in teams, and a "capstone project" that is posed by industrial scientists. Recent industrial sponsors included D-Wave Systems, Exxonmobil, Los Alamos National Laboratories, Milwaukee Brewers, Starbucks.

Weekly seminars by speakers from many industry sectors provide the students with opportunities to learn about a variety of possible future careers.

Eligibility

Applicants must be current graduate students in a Ph.D. program at a U.S. institution during the period of the boot camp.

Logistics

The program will take place online. Students will receive a $800 stipend.

Applications

To apply, please supply the following materials through the link at the top of the page:

Statement of reason for participation, career goals, and relevant experience
Unofficial transcript, evidence of good standing, and have full-time status
Letter of support from advisor, director of graduate studies, or department chair

Selection criteria will be based on background and statement of interest, as well as geographic and institutional diversity. Women and minorities are especially encouraged to apply. Selected participants will be contacted in April.

Participants

Name	Department	Affiliation
Douglas Armstrong	Department of Data Science	Securian Financial
Yuchen Cao	Department of Mathematics	University of Central Florida
Samara Chamoun	Department of Mathematics	Michigan State University
Ana Chavez Caliz	Department of Mathematics	Pennsylvania State University
Alexander Estes	Institute for Mathematics and its Applications	University of Minnesota, Twin Cities
Raymond Friend Jr	Department of Mathematics	Pennsylvania State University
Ghodsieh Ghanbari	Department of Mathematics and Statistics	Mississippi State University
Marc Haerkoenen	School of Mathematics	Georgia Institute of Technology
Tony Haines	Department of Computational and Applied Mathematics	Old Dominion University
Natalie Heer		CH Robinson
Thomas Hoft	Department of Mathematics	University of St. Thomas
Alicia Johnson	Department of Mathematics, Statistics, and Computer Science	Macalester College
Malick Kebe	Department of Mathematics	Howard University (Washington, DC, US)
Juergen Kritschgau	Department of Mathematics	Iowa State University
Marshall Lagani	Department of Data Science	Securian Financial
Kevin Leder	Department of Industrial System and Engineering	University of Minnesota, Twin Cities
Ivan Marin		Cargill, Inc.
Francisco Martinez Figueroa	Department of Mathematics	The Ohio State University
Avishek Mukherjee	Department of Mathematical Sciences	University of Delaware (Newark, DE, US)
Muharrem Otus	Department of Mathematics	University of Pittsburgh
Smita Praharaj	Department of Mathematics	University of Missouri
Tanmay Raj		Cargill, Inc.
Abba Ramadan	Department of Applied Mathematics	University of Kansas
Samanwita Samal	Department of Mathematics	Indiana University
Natalie Sheils		UnitedHealth Group
Blerta Shtylla		Pfizer
David Shuman	Department of Mathematics, Statistics and Computer Science	Macalester College
Lauren Snider	Department of Mathematics	Texas A & M University
Daniel Spirn	University of Minnesota	University of Minnesota, Twin Cities
Elizabeth Sprangel	Department of Mathematics	Iowa State University
Kaisa Taipale	Contractual Pricing Group	CH Robinson
Sijie Tang	Department of Mathematics	University of Wyoming
Cameron Thieme	Department of Mathematics	University of Minnesota, Twin Cities
Shuxian Xu	Department of Mathematics	University of Pittsburgh
Lei Yang	Department of Mathematics	Northeastern University
Grace Zhang	School of Mathematics	University of Minnesota, Twin Cities
Miao Zhang	Department of Mathematics	Louisiana State University
Jennifer Zhu	Department of Mathematics	Texas A & M University
Ahmed Zytoon	Department of Mathematics	University of Pittsburgh

Projects and teams

Team 1 — Cargill: Hydrologic Energy Generation Optimization

Mentor Ivan Marin, Cargill Corporation
Mentor Tanmay Raj, Cargill Corporation
Ana Chavez Caliz, Pennsylvania State University
Francisco Martinez Figueroa, Ohio State University
Juergen Kritschgau, Iowa State University
Avishek Mukherjee, University of Delaware
Smita Praharaj, University of Missouri
Cameron Thieme, University of Minnesota
Jennifer Zhu, Texas A & M University

The increased penetration of variable renewable energy (VRE) and phase-out of nuclear and other conventional electricity generation sources will require an additional flexibility in the power grid and a demand to lower the gap between the generation and demand, and how this can influence the energy pricing in the short and long term. Clean water is essential for hydropower generation, and the main source of electrical power generation in Brazil. Due to the limited water resources and the variability of precipitation, there is a need to investigate an optimal management of these resources in order to meet the power grid demand, and predict the power generation capacity, given the historical rain patterns, reservoir water levels and energy demands.

Team 2 — Securian Financial: Predicting Group Life Client Mortality During a Pandemic

Mentor Douglas Armstrong, Securian Financial
Yuchen Cao, University of Central Florida
Samara Chamoun, Michigan State University
Marc Haerkoenen, Georgia Institute of Technology
Abba Ramadan, University of Kansas
Lei Yang, Northeastern University
Shuxian Xu, University of Pittsburgh

During a pandemic the ability to predict risk for clients becomes paramount to manage risk effectively. The impact that a pandemic has may differ depending on the demographics and regional considerations for each client. This brings in additional complexity to the analysis and forecasting of future risk a client may pose. In this project, students will enrich a simulated client dataset with publicly available data before developing a machine-learning based approach to predict adverse risk of multiple clients.

Team 3 — CH Robinson: Impact of Weather and Agricultural Events on Truckload Cost Per Mile

Mentor Kaisa Taipale, CH Robinson
Raymond Friend Jr, Pennsylvania State University
Ghodsieh Ghanbari, Mississippi State University
Tony Haines, Old Dominion University
Malick Kebe, Howard University
Elizabeth Sprangel, Iowa State University
Grace Zhang, University of Minnesota

Fresh fruits and vegetables are an important group of commodities in the US commonly transported by truck from fields in predominantly southern growing regions across the US (for instance, from California to the Northeast). While irrigation dampens the effect of rainfall crop yields, temperature and rainfall are still important factors in the timing of fresh fruit and vegetable harvest and thus transport. This work will examine the magnitude of impact of vegetable harvest timing on transportation costs, using external inputs like temperature and rainfall as well as variables intrinsic to the truckload market. Challenges include combining the geographic characteristics of the time series involved: univariate time series methods provide some benefit but stronger results come from exploiting geography and freight characteristics. Bayesian models and causal impact analysis are natural tools for this application.

Team 4 — CH Robinson: CH Robinson Volume Simulation

Mentor Natalie Heer, CH Robinson
Mentor Bethany Stai, CH Robinson
Mentor Michael Chmutov, CH Robinson
Mentor Kaisa Taipale, CH Robinson
Muharrem Otus, University of Pittsburgh
Samanwita Samal, Indiana University
Lauren Snider, Texas A & M University
Sijie Tang, University of Wyoming
Miao Zhang, Louisiana State University
Ahmed Zytoon, University of Pittsburgh

In Economics there is classically an inverse relationship between the price of an item and the quantity of the item that customers will choose to purchase. If prices increase, customers will purchase fewer items, and if prices decrease customers will choose to purchase more items. If companies can predict the volume change associated with a change in price, they can optimize their pricing strategy for overall profitability max(Unit Price * Volume). The goal of this project is to help CHR be smarter in optimizing our business strategy.

Winter Math-to-Industry Boot Camp

$2021 Winter Math-to-Industry Boot Camp poster$

Advisory: Application deadline is Friday, December 4, 2020

2021 Winter Virtual Boot Camp poster

Organizers:

Jasmine Foo, University of Minnesota, Twin Cities
Thomas Hoft, University of St. Thomas
Daniel Spirn, University of Minnesota, Twin Cities

The Winter Math-to-Industry Boot Camp is an intensive, two-week program that provides graduate students with training and experience that is valuable for employment outside of academia. The program is targeted at Ph.D. students in mathematics and statistics. The winter camp consists of pre-camp coursework in the basics of programming, data analysis, and optimization.

During the program, students work in small teams under the guidance of an industry mentor using a variety of streaming technology. The mentor and camp staff will help guide the students in the modeling process, analysis, and computational work associated with a real-world industrial problem. Additional time will be spent on developing professional and networking skills, meeting industry scientists, and participating in a career fair.

Each team will be expected to make a final presentation and submit a written report at the end of the workshop.

Recent industrial sponsors included Cargill, D-Wave Systems, the Mayo Clinic, Securian Financial, World Wide Technology.

Eligibility

Applicants must be current graduate students in a mathematical sciences Ph.D. program at a U.S. institution during the period of the boot camp.

Logistics

The program will take place online. Students will receive a $500 stipend.

Applications

To apply, please supply the following materials through the link at the top of the page:

Statement of reason for participation, career goals, and relevant experience
Unofficial transcript, evidence of good standing, and have full-time status
Letter of support from advisor, director of graduate studies, or department chair

Participants

Name	Department	Affiliation
Daniel Alhassan	Department of Mathematics and Statistics	Missouri University of Science and Technology
Mohamed Imad Bakhira	Department of Mathematics	The University of Iowa
Yiqing Cai		Gro Intelligence
Frankie Chan	Department of Mathematics	Purdue University
Jorge Cisneros Paz	Department of Applied Mathematics	University of Washington
Paula Dassbach		Medtronic
Jerry Dogbey-Gakpetor	Statistics	North Dakota State University
Henry Fender	Department of Data Science	ITM TwentyFirst LLC
Shihang Feng	Applied Mathematics and Plasma Physics	Los Alamos National Laboratory
Jasmine Foo	School of Mathematics	University of Minnesota, Twin Cities
Jonathan Hill		ITM TwentyFirst LLC
Thomas Hoft	Department of Mathematics	University of St. Thomas
Salomea Jankovic	Department of Mathematics	University of Minnesota, Twin Cities
Henry Kvinge		Pacific Northwest National Laboratory
Axel La Salle	School of Mathematical and Statistical Sciences	Arizona State University
Youzuo Lin	Earth and Environmental Sciences Division	Los Alamos National Laboratory
Sander Mack-Crane	Department of Mathematics	University of California, Berkeley
Maia Powell	Department of Applied Mathematics	University of California, Merced
Lee Przybylski	Mathematics	Iowa State University
Priyanka Rao	Department of Mathematics & Statistics	Washington State University
Majerle Reeves	Department of Applied Mathematics	University of California, Merced
Daniel Spirn	University of Minnesota	University of Minnesota, Twin Cities
Anna Srapionyan		Merrill Lynch
Wencel Valega Mackenzie	Department of Mathematics	University of Tennessee
Christine Vaughan	Department of Mathematics and Mechanical Engineering	Iowa State University
Elise Walker	Department of Mathematics	Texas A & M University
Max Wimberley	Department of Mathematics	University of California, Berkeley
Harrison Wong	Department of Mathematics	Purdue University
Cancan Zhang	Department of Mathematics	Northeastern University

Projects and teams

Project 1: Record Linkage: Synthesizing Expert Systems and Machine Learning

Mentor Jonathan Hill, ITM TwentyFirst LLC
Mentor Henry Fender, ITM TwentyFirst LLC
Jorge Cisneros Paz, University of Washington
Jerry Dogbey-Gakpetor, North Dakota State University
Majerle Reeves, University of California, Merced
Elise Walker, Texas A & M University
Max Wimberley, University of California, Berkeley
Harrison Wong, Purdue University

Record linkage is a common big data process where shared records in two large datasets are linked based on common fields. Longevity Holdings designed an expert system to automate record linkage between client data and a corpus of death records. This system produces scores that sort record pairs into matches and non-matches. Currently, high and low scores separate cleanly, but mid-tier scores must be manually reviewed. This led us to ask: Can machine learning improve an expert system in record linkage and reduce the size of this review set?

We are working with a variant of the Expectation Maximization (EM) algorithm following the Fellegi-Sunter approach to record linkage. We implemented this algorithm but have not found an optimal configuration for our data. The algorithm is general so we can manipulate many aspects of the input. Our priority is to determine whether there is a configuration that can improve the expert system.

EM is not the only viable approach to this problem. There are a wide range of existing methods that can be applied to record linkage. Our priority is to figure out the pros and cons for each, while trying to exceed EM and expert system performance.

On this project, you will work with real-world data and learn to organize as a team. You will deliver a whitepaper summarizing your process and results. We are most interested in your clear thinking and structured approach to this problem. We will divide into two groups focusing on one of the priorities above. Both groups will receive two validated sets of record pairs, one deriving from obituaries and the other from state and federal records. Our toolset will include python, pandas, and scikit-learn.

Project 2: Data-Driven Computational Seismic Inversion

Mentor Youzuo Lin, Los Alamos National Laboratory
Mentor Shihang Feng, Los Alamos National Laboratory
Frankie Chan, Purdue University
Salomea Jankovic, University of Minnesota, Twin Cities
Sander Mack-Crane, University of California, Berkeley
Priyanka Rao, Washington State University
Christine Vaughan, Iowa State University
Cancan Zhang, Northeastern University

Computational seismic inversion turns geophysical data into actionable information. The technique has been widely used in geophysical exploration to characterize the subsurface structure. Such a clear and accurate map of the subsurface is crucial for determining the location and size of reservoirs and mineral features.

Seismic inversion usually presents itself as an inverse problem. However, solving those inverse problems has been notoriously challenging due to their ill-posed and computationally expensive nature. On the other hand, with advances in machine learning and computing, and the availability of more and better data, there has been notable progress in solving such problems. In our recent work [1, 2], we developed end-to-end data-driven subsurface imaging techniques and produced encouraging results when test data and training data share similar statistics characteristics. The high accuracy of the predictive model is built on the assumption that the training dataset captures the distribution of the target dataset. Therefore, it is critical to obtain a sufficient amount of high-quality training set.

In this project, students will work with LANL scientists to study the impact of the training data on the resulting predictive model. In particular, students will explore and develop different techniques to generate high-quality synthetic data that could be used to enhance the training data quality. Through the project, students will have the opportunity to learn deep learning and its applications in computational imaging and the fundamentals of ill-posed inverse problems.

Reference:

[1]. Yue Wu and Youzuo Lin, “InversionNet: An Efficient and Accurate Data-driven Full Waveform Inversion,” IEEE Transactions on Computational Imaging, 6(1):419-433, 2019.

[2]. Zhongping Zhang and Youzuo Lin, “Data-driven Seismic Waveform Inversion: A Study on the Robustness and Generalization,” in IEEE Transactions on Geoscience and Remote Sensing, 58(10):6900-6913, 2020.

Project 3: The Impact of Climate Change on Crop Yield

Mentor Yiqing Cai, Gro Intelligence
Daniel Alhassan, Missouri University of Science and Technology
Mohamed Imad Bakhira, The University of Iowa
Axel La Salle, Arizona State University
Maia Powell, University of California, Merced
Lee Przybylski, Iowa State University
Wencel Valega Mackenzie, University of Tennessee

Gro is a data platform with comprehensive data sources related to food and agriculture. With data from Gro, stakeholders can make quicker and better decisions. In this project, the students will use data from Gro to quantify the impact of climate change on crop yield, and create visualizations to demonstrate their findings. For example, they can use long-term climate data from Gro, to predict corn yield in Minnesota, 100 years from now. Based on the results, they might be able to conclude that Minnesota will no longer be suitable for growing corn in 100 years, or the areas suitable for corn will shift from the south to the north within Minnesota. Furthermore, they can scale the analysis to the whole globe, and create cool visualizations to show the results.

Data will be provided through Gro API (Python client). For data discovery and visualizations, the students can interact with the Gro web app directly. Once they decide what data to pull from Gro, they can export a code snippet and use the API client to download the data. Data pulled from Gro are in the format of time series, which are called data series. A data series is made up of data points, each with a start and end timestamp. Different data series can come from different sources, and have different frequencies. For example, there are projected monthly precipitation and air temperature from the GFDL B1 model all the way to year 2100, that are available across the whole world.

The deliverables of this project are two-fold: a Jupyter notebook (hosted on Infrastructure provided by Gro) and a visual presentation of the results. It can even be the combination of the two. The Jupyter notebook should be executable end-to-end, from fetching the data from Gro API, to export predictions as files, or as visualizations.