Past Events

Math-to-Industry Boot Camp IV

Advisory: Extended application deadline is March 22, 2019

Poster

Organizers: 

The Math-to-Industry Boot Camp is an intense six-week session designed to provide graduate students with training and experience that is valuable for employment outside of academia. The program is targeted at Ph.D. students in pure and applied mathematics. The boot camp consists of courses in the basics of programming, data analysis, and mathematical modeling. Students work in teams on projects and are provided with training in resume and interview preparation as well as teamwork.

There are two group projects during the session: a small-scale project designed to introduce the concept of solving open-ended problems and working in teams, and a "capstone project" that is posed by industrial scientists. Last year's industrial sponsors included 3M, D-Wave Systems, Milwaukee Brewers, National Security Technologies, Schlumberger-Doll Research, and Whitebox Advisors. 

Weekly seminars by speakers from many industry sectors provide the students with opportunities to learn about a variety of possible future careers.

Eligibility

Applicants must be current graduate students in a Ph.D. program at a U.S. institution during the period of the boot camp.

Logistics

The program will take place at the IMA on the campus of the University of Minnesota. Students will be housed in a residence hall on campus and will receive a per diem and a travel budget, as well as an $800 stipend.

Applications

To apply, please supply the following materials through the link at the top of the page:

  • Statement of reason for participation, career goals, and relevant experience
  • Unofficial transcript, evidence of good standing, and have full-time status
  • Letter of support from advisor, director of graduate studies, or department chair

Selection criteria will be based on background and statement of interest, as well as geographic and institutional diversity. Women and minorities are especially encouraged to apply. Selected participants will be contacted in April.

Participants

Name Department Affiliation
Jesse Berwald   D-Wave Systems
Nicole Bridgland   World Wide Technology
Benjamin Brubaker School of Mathematics University of Minnesota, Twin Cities
Yiqing Cai   Gro Intelligence
Sarah Chehade Department of Mathematics University of Houston
Brendan Cook   University of Minnesota, Twin Cities
William Cooper Department of Mechanical Engineering University of Minnesota, Twin Cities
Steven Dabelow Department of Applied and Computational Mathematics and Statistics University of Notre Dame
Davood Damircheli Department of Mathematics and Statistics Mississippi State University
Dilek Erkmen Department of Mathematical Science Michigan Technological University
Jonathan Hahn   World Wide Technology
Jordyn Harriger Department of Mathematics Indiana University
Brad Hildebrand   Cargill, Inc.
Jonathan Hill   ITM TwentyFirst LLC
Thomas Hoft Department of Mathematics University of St. Thomas
SeongHee Jeong   Louisiana State University
Michael Johnson Strategic Marketing and Portfolio Division Cargill, Inc.
Kiwon Lee Department of Mathematics The Ohio State University
Xing Ling Department of Mathematical Science Michigan Technological University
Sijing Liu Department of Mathematics Louisiana State University
Kevin Marshall Department of Mathematics University of Kansas
Kristina Martin Department of Supervision, Regulation, and Credit Federal Reserve Bank of Minneapolis
Vikenty Mikheev Department of Mathematics Kansas State University
Sarah Milstein   University of Minnesota, Twin Cities
Sarah Miracle Department of Computer and Information Sciences University of St. Thomas
Bibekananda Mishra Department of Mathematics University of Kansas
Whitney Moore Career Center for Science and Engineering University of Minnesota, Twin Cities
Anthony Nguyen Department of Mathematics University of California, Davis
Damilola Olabode Department of Mathematics and Statistics Washington State University
Negar Orangi-Fard Department of Mathematics Kansas State University
Samantha Pinella Department of Mathematics University of Michigan
Michelle Pinharry School of Mathematics University of Minnesota, Twin Cities
Puttipong Pongtanapaisan Department of Mathematics The University of Iowa
Matthew (Jake) Roberts Department of Mathematical Sciences Michigan Technological University
Jose Pedro Rodriguez Ayllon Department of Mathematics University of Houston
Nandita Sahajpal Department of Mathematics University of Kentucky
Fadil Santosa School of Mathematics University of Minnesota, Twin Cities
Samantha Schumacher Department of Data Science & Analysis Target Corporation
Olabanji Shonibare   Starkey Hearing Technologies
David Shuman Department of Mathematics, Statistics and Computer Science Macalester College
Matthew Sikkink Johnson Department of Mathematics University of Minnesota, Twin Cities
Daniel Spirn University of Minnesota University of Minnesota, Twin Cities
Rebeccah Stay   Cargill, Inc.
Ben Strasser Department of Mathematics University of Minnesota, Twin Cities
Rahim Taghikhani School of Mathematics and Statistics Arizona State University
Zeinab Takbiri Department of Engineering R&D and Data Science Cargill, Inc.
Tianyu Tao Department of Mathematics University of Minnesota, Twin Cities
Jing Wang   Thrivent Financials
Nathan Willis Department of Mathematics The University of Utah
Guanglin Xu Institute for Mathematics and its Application University of Minnesota, Twin Cities
Yanhua Yuan   ExxonMobil
Christina Zhao   University of Minnesota, Twin Cities
Li Zhu Department of Mathematical Sciences University of Nevada

 

Projects and teams

Project 1: Rail car supply forecasting

  • Mentor Zeinab Takbiri, Cargill, Inc.
  • Sijing Liu, Louisiana State University
  • Damilola Olabode, Washington State University
  • Puttipong Pongtanapaisan, The University of Iowa
  • Nathan Willis, The University of Utah

Cargill is a major grain trader in the US. We utilize over 100,000 rail cars per year to ship grains to our domestic and export customers. Cargill uses railroad-supplied cars to move a lot of these shipments of grain. The railroads require us to take on an obligation to run their cars for a year. We are looking for help in developing a supply and demand model that can determine how many cars Cargill should take on in a given year as well as a forecast of the overall market’s need for railroad owned equipment.

Project 2: Accuracy of a simple freeze-out model as a description of the QPU distribution for C4 RAN1 problems

  • Mentor Jesse Berwald, D-Wave Systems
  • Sarah Chehade, University of Houston
  • Davood Damircheli, Mississippi State University
  • Kevin Marshall, University of Kansas
  • Li Zhu, University of Nevada

A quantum processing unit (QPU) is a programmable chip that leverages superposition and entanglement, fundamental quantum mechanical properties, to solve problems. The D-Wave quantum annealing computer currently operates with a 2048-qubit QPU. Calibrating such a chip in the presence of thermal, quantum mechanical, and design-specific noise is a critical component to producing a working quantum computer. 

D-Wave Systems has developed many internal calibration tests to infer anomalies observed in the QPU. Error correction on many levels is used to mitigate these anomalies wherever possible (though thermal and quantum fluctuations will always be present). The variety of tests often requires different models and statistical methods. This project looks at a test of a specific configuration of randomly coupled qubits (C4 RAN1). Students will implement and fit a model based on observations from the QPU. A significant part of the pipeline will include a visualization component to enable easy, and deeper, analysis of anomalies if they are present. 

Project 3: Improving Mine Dispatching

  • Mentor Nicole Bridgland, World Wide Technology
  • Mentor Jonathan Hahn, World Wide Technology
  • Steven Dabelow, University of Notre Dame
  • Jordyn Harriger, Indiana University
  • SeongHee Jeong, Louisiana State University
  • Kiwon Lee, The Ohio State University

Mines have lots of moving parts, and timing of delivery between them is crucial.  Time that mining equipment spends idle represents lost production opportunity. Time trucks spend idle, while not as obviously problematic, represents at least wasted fuel if not lost production opportunity elsewhere in the mine.  Given a system of several shovels and crushers, and trucks moving material between them, how can you best decide where to send empty/loaded trucks as they become available? When equipment experiences delays, when should you reroute trucks vs simply wait it out, and how should you reroute them? The goal of this project will be to develop tools to help human dispatchers make these decisions, possibly in the form of machine-generated recommendations.

Project 4: Analogous year detection

  • Mentor Yiqing Cai, Gro Intelligence
  • Xing Ling, Michigan Technological University
  • Ben Strasser, University of Minnesota, Twin Cities
  • Rahim Taghikhani, Arizona State University
  • Tianyu Tao, University of Minnesota, Twin Cities

Gro is a data platform with comprehensive data sources related to food and agriculture. With data from Gro, stakeholders can make quicker and better decisions, which in most cases are time sensitive. In this project, the students will use data from Gro to identify analogous events. For example, people can compare and find a year with similar precipitation and soil moisture patterns to draw inferences about second and third order effects such as flooding or decreased crop planted area. This type of analysis can help quantify the impact of an event, and remedy the negative impact if it is severe and not avoidable.

Data will be provided through Gro API. Data pulled from Gro are in the format of time series, which are called data series. Different data series can come from different sources, and have different frequencies. For example, there is daily Precipitation data from TRMM, and NDVI at a frequency of 8 days (a type of vegetation index) from GIMMS MODIS.  

Goals: The deliverables of this project will be in the form of an executable model. Given a data series (or a set of data series), and a selected time period, find analogous periods in history that are most similar to this selected period. Given the project goal, it all boils down to defining similarity between a pair of data series, or concatenated data series. 

Project 5: Deblending simultaneous-source seismic signals

  • Mentor Yanhua Yuan, ExxonMobil
  • Dilek Erkmen, Michigan Technological University
  • Anthony Nguyen, University of California, Davis
  • Samantha Pinella, University of Michigan
  • Jose Pedro Rodriguez Ayllon, University of Houston
  • Nandita Sahajpal, University of Kentucky

Acquisition of seismic data in marine environment is a costly process. Traditionally, in marine seismic surveys, a boat tows a line of receivers while moving slowly. To obtain signals at the receivers, a wave source, typically an air gun, is generating a pulse with frequencies in the 10 of Hz which penetrates the earth and reflects back on the different layers of the earth. Recently, an innovation in this space was introduced that has been shown to have substantial savings and allowed for wider distances between the source and the receivers. In the new method, more than one seismic sources or air guns are fired with short or zero delays between them so that the signal generated by each source overlap at some or all receivers. The collected signals at the receivers are therefore blended together in simultaneous-source acquisition, and a “deblending” process is usually needed to separate signals from the individual sources before any further analysis. To make it easier for decoding, multiple sources are usually fired at a random time, and (or) with signatures coded differently. Based on the incoherence assumption, the deblending problem can be explored in different ways, including as signal processing problem, inversion problem, or data analytics problem. In this project, we will try these methods and look for a robust deblending algorithm to reconstruct individual source signals from encoded data.

Project 6: Accuracy and precision of Time-to-Event Models with Flexible Dimensionality

  • Mentor Jonathan Hill, ITM TwentyFirst LLC
  • Brendan Cook, University of Minnesota, Twin Cities
  • Vikenty Mikheev, Kansas State University
  • Bibekananda Mishra, University of Kansas
  • Negar Orangi-Fard, Kansas State University
  • Matthew (Jake) Roberts, Michigan Technological University

Medical underwriting is expensive and time-consuming, involving trained underwriters who manually review medical history and long delays waiting for documentation. For these reasons, researchers in life insurance and related industries are fervently searching for methods to estimate mortality risk faster and at lower cost.

One proposed solution is to use a smaller set of medical features than what is typically collected in underwriting. These features could be collected through a questionnaire and used to generate a rapid estimate of mortality risk. This solution could have additional value in cases of full underwriting where some medical data is missing. A key objective will be quantifying the increase in uncertainty, or decrease in precision, as a consequence of using a smaller feature set.

During this week-long project, you will take a crash course in survival analysis, explore models for time-to-event data (including traditional and machine learning approaches), determine appropriate metrics, engineer features, and compete to create the best possible model of mortality risk. If time allows, there may be opportunity to develop novel modelling techniques.

We will be using a unique world-class dataset on senior life outcomes provided by ITM TwentyFirst, a Minneapolis-based life settlements servicing company.

Math-to-Industry Boot Camp III

Advisory: Application deadline is February 28, 2018

Organizers: 

The Math-to-Industry Boot Camp is an intense six-week session designed to provide graduate students with training and experience that is valuable for employment outside of academia. The program is targeted at Ph.D. students in pure and applied mathematics. The boot camp consists of courses in the basics of programming, data analysis, and mathematical modeling. Students work in teams on projects and are provided with training in resume and interview preparation as well as teamwork.

There are two group projects during the session: a small-scale project designed to introduce the concept of solving open-ended problems and working in teams, and a "capstone project" that is posed by industrial scientists.

Weekly seminars by industrial scientists provide the students with opportunities to learn about a variety of possible future careers.

Eligibility

Applicants must be current graduate students in a Ph.D. program at a U.S. institution during the period of the boot camp.

Logistics

The program will take place at the IMA on the campus of the University of Minnesota. Students will be housed in a residence hall on campus and will receive a per diem and a travel budget, as well as an $800 stipend.

Applications

To apply, please supply the following materials through the link at the top of the page:

  • Statement of reason for participation, career goals, and relevant experience
  • Unofficial transcript, evidence of good standing, and have full-time status
  • Letter of support from advisor, director of graduate studies, or department chair

Selection criteria will be based on background and statement of interest, as well as geographic and institutional diversity. Women and minorities are especially encouraged to apply. Selected participants will be contacted in April.

Participants

Name Department Affiliation
Muhammad Afridi   3M
Nicholas Asendorf   3M
Christopher Bemis   Whitebox Advisors
Nitsan Ben-Gal Software, Electronics and Mechanical Systems Laboratory 3M
Jesse Berwald   D-Wave Systems
Ariel Bowman Department of Mathematics University of Texas at Arlington
Chris Browne Center for Applied Mathematics Cornell University
Benjamin Brubaker School of Mathematics University of Minnesota, Twin Cities
Kate Brubaker Department of Mathematics Purdue University
Irfan Bulu Department of Math and Modeling Schlumberger-Doll Research
Shawn Burkett Mathematics University of Colorado
Olivia Cannon Department of Mathematics University of Minnesota, Twin Cities
Jared Catenacci Diagnostic Research and Material Studies National Security Technologies, LLC
Chirasree Chatterjee Department of Mathematics and Statistics Saint Louis University
Hua Chen Department of Mathematical Sciences University of Delaware
Aaron Cohen Department of Mathematics Indiana University
Paula Dassbach   Medtronic
Mingchang Ding Department of Mathematical Sciences University of Delaware
Jasmine Foo School of Mathematics University of Minnesota, Twin Cities
Zhen Gao Department of Mathematics Vanderbilt University
Maria Gommel Department of Mathematics The University of Iowa
Hayley Guy School of Mathematics North Carolina State University
Qie He Department of Industrial and Systems Engineering University of Minnesota, Twin Cities
Thomas Hoft Department of Mathematics University of St. Thomas
Ruihao Huang Department of Mathematical Sciences Michigan Technological University
Jeffrey Humpherys   UnitedHealth Group
Laura Iosip Department of Mathematics University of Maryland
Melanie Jensen Department of Mathematics Tulane University
Alicia Johnson   Macalester College
Ekaterina Kryuchkova Center for Applied Mathematics Cornell University
Kevin Leder Department of Industrial System and Engineering University of Minnesota, Twin Cities
Philku Lee Department of Mathematics and Statistics Mississippi State University
SangJoon Lee Department of Mathematics University of Connecticut
Hengguang Li Department of Mathematics Wayne State University
Aaron Luttman Diagnostic Research and Material Studies National Security Technologies, LLC
Christopher Miller School of Mathematics University of California, Berkeley
Cristian Minoccheri Department of Mathematics State University of New York, Stony Brook (SUNY)
Sarah Miracle Department of Computer and Information Sciences University of St. Thomas
Shannon Negaard-Paper   University of Minnesota, Twin Cities
Elpiniki Nikolopoulou Department of Applied Mathematics and Statistics Arizona State University
Michelle Pinharry School of Mathematics University of Minnesota, Twin Cities
Iurii Posukhovskyi Department of Mathematics University of Kansas
Mrinal Raghupathi USAA Asset Management Company USAA Asset Management Company
Michael Ramsey Department of Applied Mathematics University of Colorado
Eric Roberts Department of Applied Mathematics University of California, Merced
Tanushree Roy School of Mathematics University of Central Florida
Keith Rush Department of Strategy and Analytics Milwaukee Brewers
Fadil Santosa School of Mathematics University of Minnesota, Twin Cities
Chang Shu Department of Applied Mathematics University of California, Davis
Dallas Smith School of Mathematics Brigham Young University
Alberto Speranzon Aerospace Honeywell
Daniel Spirn University of Minnesota University of Minnesota, Twin Cities
Binh Tang Department of Statistical Science Cornell University
Elizabeth Wicks School of Mathematics University of Washington
Shiqiang Xia   University of Minnesota, Twin Cities
Di Ye   Zhennovate
Yufei Yu Department of Mathematics University of Kansas
Sheng Zhang Department of Mathematics Purdue University

 

Projects and teams

Team 1: Mathematical Models for Adaptive Multi-modal Sensing

  • Mentor Aaron Luttman, National Security Technologies, LLC
  • Mentor Jared Catenacci, National Security Technologies, LLC
  • Ariel Bowman, University of Texas at Arlington
  • Shawn Burkett, University of Colorado
  • Hayley Guy, North Carolina State University
  • Laura Iosip, University of Maryland
  • Yufei Yu, University of Kansas
  • Sheng Zhang, Purdue University

Scientific experiments are a natural source of data – which usually means diagnostic systems fielded to collect information within the experiments themselves – but there has been a recent trend towards collecting data around big science experiments to understand if we can detect and characterize the behaviors associated with the experiments. The question is whether it is possible to determine what experiments are being conducted by analyzing human patterns, so-call “patterns of life,” around and in the experimental facilities. In order to measure patterns of life, we analyze many different types of data, from power grid load profiles to internet activity to sound and pressure signals from cars.

There are two primary challenges that must be addressed:

Mathematical Models for Adaptive Sensing – When should a sensor system turn on its sensors and transmit its data, given that these two activities take a lot of power?

Physics-based Multi-modal Feature Selection and Detection – How can one incorporate physics models for sensing into machine learning approaches to data analysis?

Real multi-sensor data will be provided for testing and validation.

Team 2: Quantum Computation and QUBO Slicing

  • Mentor Jesse Berwald, D-Wave Systems
  • Olivia Cannon, University of Minnesota, Twin Cities
  • Tanushree Roy, University of Central Florida
  • Chang Shu, University of California, Davis
  • Dallas Smith, Brigham Young University
  • Elizabeth Wicks, University of Washington
Background

Quantum annealing computers have begun to enter the business and academic worlds. Over the past five years they have been used for a wide variety of (prototypical) applications, with evidence of differentiated performance in some cases.

A first step in utilizing these computers is to reformulate the problem in an energy minimization framework. This is typically cast as a Hamiltonian, or alternatively as a quadratic unconstrained binary optimization (QUBO), which can be represented as a matrix. These formulations are translated to the physical qubits on the quantum processing unit (QPU) through a process termed “embedding”. Embedding a given problem onto the QPU is handled through a number of different heuristics and is an active area of research in itself, one of which is described below.

Problem statement

In this project we will investigate one proposed solution to the embedding problem:

The goal is to make the most efficient use of the qubit hardware by developing a parameterized transformation from the space spanned by physical qubits, “qubit space”, to the space spanned by problem variables, the “problem search space”. Our goal will be to define a linear transformation from qubit space to problem search space that allows for a more efficient use of available hardware.

Since the problem space is (in general) much larger than the qubit space, a fixed parameterization will succeed in mapping the qubit space into an proper subspace of the problem space. We term these subspaces “slices”. This reduced problem can then be solved with an optimal use of the available hardware. Using different parameterizations, we can define a series of linear transformations onto orthogonal subspaces of the problem space.

There are many parameterizations to choose from, each of which raises a number of research questions. We will prioritize our investigation roughly as follows:

  1. Given a QUBO matrix defining the problem search space, is there an algorithm that produces the most efficient set of transformations (parameterizations) from qubit space to problem space?
  2. Is there a greedy algorithm that is best in practice — i.e. choose a slice that maximizes the use of the chip, and then choose successively smaller slices to query the entire search space.
  3. What is the role of sparsity in the choice of transformations?
  4. The QPU itself has a unique architecture. How does this architecture affect the choice of transformations?
References

Team 3: Time Series Analysis of Gas Mixture Data

  • Mentor Nicholas Asendorf, 3M
  • Kate Brubaker, Purdue University
  • Ruihao Huang, Michigan Technological University
  • Philku Lee, Mississippi State University
  • Elpiniki Nikolopoulou, Arizona State University
  • Michelle Pinharry, University of Minnesota, Twin Cities
Motivation

Sensor networks are ubiquitous in today’s Internet of Things, capable of collecting high frequency data in a cost efficient way. This results in mountains of time-series data that hopefully contain signals of interest buried in noise. As the number of deployed sensors grows, so does the dimensionality of the observed data, further increasing the complexity of the problem. 3M is interested in such large scale time series analyses because many of our datasets can be framed in this way: manufacturing, sales, and chemical experiments to name a few.

Dataset

This publicly available dataset contains time series sensor readings from chemical sensors over the duration of 12 hours. The input to these sensors are known concentrations of various gases. The dataset contains timestamped measurements from 16 gas sensors and the input concentrations of the gases. This is a labeled time series dataset. There are two different gas mixture measurement files, one for Ethylene and CO, and one for Ethylene and Methane. At 3M, we may have similar types of experimental data (perhaps using different sensors) where we would like to determine the interactions between materials or understand fundamental properties of materials. Being able to intelligently and efficiently mine these rich datasets for insights about material characteristics is critical.

The Challenge

Some interesting problems to consider:

  • Develop an algorithm to estimate the concentration of each gas given sensor measurements. You might approach this problem using classical machine learning, splitting data into training, validation, and testing, while treating time series measurements as independent points.
  • Develop algorithms to estimate the concentrations of each gas using time series based methods like windowing, tsfresh, or RNNs. In this approach, we don’t want to treat each measurement as independent. How do these algorithms compare to classical machine learning techniques?
  • Can you use the fact that we have 4 replicates of each sensor at each time point to improve your algorithms? Can you use any clever data fusion techniques or outlier detection strategies?
  • What can you tell about the importance or accuracy of the 4 types of sensors used?
  • What happens when we purposely introduce missing data? Can we use the replicates of each sensor to overcome this? How robust are your algorithms to missing data?
  • Since each dataset has measurements for Ethylene, can we use both datasets to develop a more robust estimation scheme for that gas?

Team 4: Structured Variational Auto Encoders

  • Mentor Irfan Bulu, Schlumberger-Doll Research
  • Hua Chen, University of Delaware
  • Aaron Cohen, Indiana University
  • Mingchang Ding, University of Delaware
  • Melanie Jensen, Tulane University
  • Christopher Miller, University of California, Berkeley
  • Michael Ramsey, University of Colorado

Generative models such as Variational Auto Encoders (VAE), Generative Adversarial Networks(GAN) have been very successful in unsupervised learning settings. In a VAE setting, we would like to learn a set of latent variables that explain our data. Although, this has been very successful as a generative model, the interpretation of latent variables is still a challenge. Ideally, what we would like to do is unsupervised learning through which we identify a number of classes (not specified yet). Once a set of classes has been identified, we can then label once instead of having to label the entire data set. Imagine you have a sample of handwritten digits without labels. If we can structure VAE in a way that it can identify 10 classes, we can then go label these classes as the relevant digits. This would be very helpful as most of our data is unlabeled or poorly labeled.

Concepts that may be helpful to know: neural network, generative models, graphical models, stochastic variational inference.

Team 5: Tailored Discovery in Stock Portfolios
  • Mentor Christopher Bemis, Whitebox Advisors
  • Chirasree Chatterjee, Saint Louis University
  • Zhen Gao, Vanderbilt University
  • Cristian Minoccheri, State University of New York, Stony Brook (SUNY)
  • Shannon Negaard-Paper, University of Minnesota, Twin Cities
  • Shiqiang Xia, University of Minnesota, Twin Cities

Modern portfolio theory has provided tools to identify systemic and idiosyncratic risks via models like Markowitz' Mean-Variance Optimization.  In addition, a taxonomy of equities has emerged through feature identification, with one of the earliest and most impactful being Fama and French's three factor model.

In this project, we will leverage technical and fundamental data like return series and earnings information along with well understood equity features like exposure to so-called size, value, and market portfolios to develop tools for suggesting supplements (e.g., technology stocks when looking at Apple) and complements (e.g., energy stocks when looking at Delta Airlines) for individual equities and portfolios.  These tools may be used in tailored discovery and research by analysts looking to  either construct a portfolio based on a theme or to diversify.  The work will ideally evolve from point estimates using simple norms in a predetermined feature space to applying machine learning techniques. 

Data will be supplied from Quandl, and the preferred language for development will be Python.

Team 6: Sequence-to-sequence modeling for the business of baseball

  • Mentor Keith Rush, Milwaukee Brewers
  • Maria Gommel, The University of Iowa
  • Ekaterina Kryuchkova, Cornell University
  • SangJoon Lee, University of Connecticut
  • Iurii Posukhovskyi, University of Kansas
  • Eric Roberts, University of California, Merced

Each fan has a unique relationship to his or her favorite sports teams, and each has a different ideal every time they step into the stadium. When a team makes a big free-agent signing in February, the fan who follows he competition closely will be ecstatic--the fan who primarily enjoys the communal aspects will only see this effect in the buzz generated in his or her social circles. In order to cherish their fans to the utmost, teams must have a global view of their business and be able to structure data from all sources and across all levels of granularity, creating one universe into which all inputs and from which all outputs feed.

This project is fundamentally a first step in that direction. The problem we are focusing on is roughly the following: conditioned on a vector representing a fan's history with the Club and the attributes of a particular game, how well can we ingest information in time and map it forward one time step. For this purpose, we will test the standard recurrent and convolutional network architectures, as well as experimenting with variants and discussing the reasons for applying each and their limitations. Data will be provided from the Brewers and the development will take place in Python, utilizing cloud infrastructure for the computing power.

Math-to-Industry Boot Camp II

Advisory: Application deadline is February 17, 2017

Organizers: 

The Math-to-Industry Boot Camp is an intense six-week session designed to provide graduate students with training and experience that is valuable for employment outside of academia. The program is targeted at Ph.D. students in pure and applied mathematics. The boot camp consists of courses in the basics of programming, data analysis, and mathematical modeling. Students will work in teams on projects and will be provided with soft skills training.

There will be two group projects during the session: a small-scale project designed to introduce the concept of solving open-ended problems and working in teams, and a "capstone project" that will be posed by industry scientists. The students will be able to interact with industry participants at various points in the program. 

Eligibility

Applicants must be current graduate students in a Ph.D. program at a U.S. institution during the period of the boot camp.

Logistics

The program will take place at the IMA on the campus of the University of Minnesota. Students will be housed in a residence hall on campus and will receive a per diem and a travel budget, as well as an $800 stipend.

Applications

To apply, please supply the following materials through the link at the top of the page:

  • Statement of reason for participation, career goals, and relevant experience
  • Unofficial transcript, evidence of good standing, and have full-time status
  • Letter of support from advisor, director of graduate studies, or department chair

Selection criteria will be based on background and statement of interest, as well as geographic and institutional diversity. Women and minorities are especially encouraged to apply. Selected participants will be contacted by March 17.

Participants

Name Department Affiliation
Sameed Ahmed Department of Mathematics University of South Carolina
Christopher Bemis   Whitebox Advisors
Amanda Bernstein Department of Mathematics North Carolina State University
Jesse Berwald Enterprise Data Analytics & Business Intelligence Target Corporation
Neha Bora Department of Mathematics Iowa State University
Jeremy Brandman Computational Physics ExxonMobil
Phillip Bressie Mathematics Kansas State University
Nicole Bridgland School of Mathematics University of Minnesota, Twin Cities
Yiying Cheng Department of Mathematics University of Kansas
Michael Dairyko Department of Mathematics Iowa State University
Miandra Ellis School of Mathematical and Statistical Sciences Arizona State University
Wen Feng Department of Applied Mathematics University of Kansas
Jasmine Foo School of Mathematics University of Minnesota, Twin Cities
Melissa Gaddy Department of Mathematics North Carolina State University
Thomas Grandine   The Boeing Company
Ngartelbaye Guerngar Department of Mathematics and Statistics Auburn University
Jamie Haddock Department of Applied Mathematics University of California, Davis
Madeline Handschy   University of Minnesota, Twin Cities
Qie He Department of Industrial and Systems Engineering University of Minnesota, Twin Cities
Thomas Hoft Department of Mathematics University of St. Thomas
Tahir Bachar Issa Department of Mathematics and Statistics Auburn University (Auburn, AL, US)
Alicia Johnson   Macalester College
Cassidy Krause Department of Mathematics University of Kansas
Kevin Leder Department of Industrial System and Engineering University of Minnesota, Twin Cities
Gilad Lerman School of Mathematics University of Minnesota, Twin Cities
Hongshan Li Department of Mathematics Purdue University
Wenbo Li Applied Mathematics & Statistics, and Scientific Computation University of Maryland
Youzuo Lin   Los Alamos National Laboratory
John Lynch Department of Mathematics University of Wisconsin, Madison
Eric Malitz Department of Mathematics, Statistics and Computer Science University of Illinois, Chicago
Tianyi Mao Department of Mathematics City University of New York
Emily McMillon Department of Mathematics University of Nebraska
Christine Mennicke Department of Applied Mathematics North Carolina State University
Kacy Messerschmidt Department of Mathematics Iowa State University
Sarah Miracle Department of Computer and Information Sciences University of St. Thomas
Ngai Fung Ng   Purdue University
Hieu Nguyen Institute for Computational Engineering and Sciences The University of Texas at Austin
Kelly O'Connell Department of Mathematics Vanderbilt University
Luca Pallucchini   Temple University
Karoline Pershell Strategy and Evaluation Division Service Robotics & Technologies
Fesobi Saliu Department of Mathematical Sciences University of Memphis
Fadil Santosa Institute for Mathematics and its Applications University of Minnesota, Twin Cities
Richard Sharp   Starbucks
Samantha Shumacher   Target Corporation
Sudip Sinha Department of Mathematics Louisiana State University
Ryan Siskind   Target Corporation
Daniel Spirn University of Minnesota University of Minnesota, Twin Cities
Anna Srapionyan Center for Applied Mathematics Cornell University
Trevor Steil School of Mathematics University of Minnesota, Twin Cities
Andrew Stein Department of Modeling and Simulation Novartis Institute for Biomedical Research
Aditya Vaidyanathan Center for Applied Mathematics Cornell University
Zachary Voller   Target Corporation
Zhaoxia Wang   Louisiana State University
Dara Zirlin Mathematics Department University of Illinois at Urbana-Champaign

 

Projects and teams

Team 1: A Dictionary-Based Remote Sensing Imagery Classification/Clustering Techniques: Features Selection, Optimization Methods

  • Mentor Youzuo Lin, Los Alamos National Laboratory

Remotely sensed imagery classification/clustering seek grouped pixels to represent land cover features. It has broad applications across engineering and sciences domains. However, because of the large volume of imagery data and limited features available, it is challenging to correctly understand the contents within the imagery. This project team will develop efficient and accurate machine-learning methods for remotely sensed imagery classification/clustering. To achieve this goal, we will explore various image classification/clustering methods. In particular, we are interested dictionary-learning based image analysis methods. Being one of the most successful machine-learning methods, dictionary learning has shown promising performances in various machine learning applications. In this project, the team will focus on the following tasks:

  •  look into a couple of state-of-the-art dictionary learning methods including K-SVD [1] and SPORCO [2]
  •  apply dictionary-learning technique to remotely sensed imagery classification/clustering
  •  compare performances of employing different dictionary-learning methods
  •  analyze computational costs, and further improve the computational efficiency

Out of this project, the team will be able to learn the fundamentals of machine learning with applications to image analysis, understand the specific computational tools for solving large-scale applications, and be capable of solving real problems with those aforementioned techniques.

References:

[1] K-SVD: M. Aharon, M. Elad and A. Bruckstein, "K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation," IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4311-4322, 2006. (Sources Available at http://www.cs.technion.ac.il/~elad/software/)

[2] SPORCO: B. Wohlberg, "Efficient Algorithms for Convolutional Sparse Representations," IEEE Transactions on Image Processing, vol. 25, no. 1, pp. 301-315, 2016. (Sources Available at http://brendt.wohlberg.net/software/SPORCO/)

Team 2: Optimizing Well Placement in an Oil Reservoir

  • Mentor Jeremy Brandman, ExxonMobil

Oil and gas – also known as hydrocarbons – are typically found thousands of meters below the earth’s surface in the void space of sedimentary rocks. The extraction of these hydrocarbons relies on the operation of injection and production wells.

Injection wells are used to displace hydrocarbons through the injection of other fluids (e.g. water and CO_2) and maintain overall reservoir pressure. Production wells are responsible for extracting reservoir fluids from the rocks and transporting them to the surface.

Drilling a well is expensive – the cost can be in the hundreds of millions of dollars – and time-consuming. Therefore, it is imperative that wells are placed and operated in a manner that optimizes reservoir profitability. The goal of this project is to develop a well placement strategy that addresses this business need.

The project’s focus will be non-invasive (i.e., black-box or derivative-free) optimization strategies for well placement. Non-invasive approaches are appealing because they do not require access to the computer code used to simulate the flow of hydrocarbons and other fluids. This is an important consideration as industrial flow simulators are complex and constantly in flux, making gradient information potentially difficult to acquire.

In order to test ideas and verify algorithms, the project will begin by considering well placement optimization in the context of a homogeneous two-dimensional reservoir. Following this, students will consider problems in heterogeneous reservoirs inspired by real-world examples.

Students will be provided with a flow simulator written in C that can be coupled to optimization algorithms written in C or Python. An introduction to modeling fluid flow in porous media will also be given.

Team 3: Machine Tool and Robot Calibration through Kinematic Analysis: A Least Squares Approach

  • Mentor Thomas Grandine, The Boeing Company

Modern machine tools and robots are constructed by assembling sequences of joints and linkages. An end effector, typically a cutter, tool, probe, or other device is attached to the end of the last linkage. Control of these devices is accomplished through a controller through which the location of the various components are programmed. In the usual cases, programming these joint and linkage locations leads to a programmed nominal position for the end effector. Because of mechanical variation and other sources of error, the nominal programmed location of the end effector and the actual location of the end effector are not exactly the same. Most controllers are equipped with compensation functions to account for this, so that the actual location of the linkages is set to the nominal position plus a correction term with the intent that the final position of the actual end effector should be much closer to the intended nominal position. One way of constructing the compensation functions is to program the machines to move the end effector to a collection of different locations. The actual location of the end effector is then measured using some independent means, often a laser scanner or other device, and the difference between the actual end effector location and the nominal end effector location can be measured. Given these discrepancies, a nonlinear least squares problem can be formulated from which accurate error functions can be constructed. In this workshop, we will review the standard methods for solving these problems and then explore some potential new ways of modeling the error functions with a view toward taking this good procedure and making it even better.

Team 4: Personalized Marketing

  • Mentor Richard Sharp, Starbucks

The goal of personalized marketing is to send the right message to the right person at the right time. Rules-based, targeted marketing suffers from a measurement problem: it works on average, being useful for some but irrelevant for others, and you can’t tell one group from the other. Online retailers are generally better able to track individual customer behavior than their brick and mortar counterparts, but still suffer from an inability to put that behavior in context. A common result is that a shed (or book or shoes or tent or whatever) chases you around the internet. Yes, you searched for it, but then you went down to the store and bought it in person. The next time that add pops up it’s gotten the behavior right, but completely missed the context: right message, right person, wrong time.

Personalized marketing attempts to reduce the inefficiency of targeted marketing by making algorithmic, rather than rules-based decisions, that treat the recipient as an individual rather than a representative of a general class. Challenges include discovering useful behavioral and contextual clues in a mountain of transactional and other data, determining an optimal decision strategy for making use of that information towards some objective, and selecting the objective itself. Unsurprisingly, increasing revenue is a common objective, but so is increasing engagement (or similarly decreasing churn) and objectives can range as widely as supporting health related decisions like smoking cessation or helping individuals make better financial decisions.

We will develop a mathematical model that is part of a working system for making offer decisions. Some of the significant topics we will work to address are:

  • measuring incremental impact
  • behavioral and contextual feature engineering
  • decision strategies and objectives
  • continual operation in a real-world setting (including feedback for system operators)

Team 5: Supporting oncology drug development by deriving a lumped parameter for characterizing target inhibition in standard math

  • Mentor Andrew Stein, Novartis Institute for Biomedical Research

During the development of biotherapeutic drugs, modelers are often asked to predict the dosing regimen needed to achieve sufficient target inhibition for efficacy in a solid tumor [1, 2]. Previous work showed that under many relevant clinical scenarios, target inhibition in blood can be characterized by a single lumped parameter: Kd*Tacc/Cavg, where Kd is the binding affinity of the drug, Tacc is the fold-accumulation of the target during therapy, and Cavg is the average drug concentration under the dosing regimen of interest [3]. This project will focus on extending these results to characterizing target inhibition in a tumor, to assist in development of targeted therapies and immunotherapies in oncology.

References
  1. Deng, Rong, et al. "Preclinical pharmacokinetics, pharmacodynamics, tissue distribution, and tumor penetration of anti-PD-L1 monoclonal antibody, an immune checkpoint inhibitor." MAbs. Vol. 8. No. 3. Taylor & Francis (2016) Suppl Fig 5.
  2. Lindauer, A., et al. "Translational Pharmacokinetic/Pharmacodynamic Modeling of Tumor Growth Inhibition Supports Dose‐Range Selection of the Anti–PD‐1 Antibody Pembrolizumab." CPT: Pharmacometrics & Systems Pharmacology (2017).
  3. Stein AM, Ramakrishna R. "AFIR: A dimensionless potency metric for characterizing the activity of monoclonal antibodies." Clin. Pharmacol. Ther: Pharmacometrics and Systems Pharmacol, doi 10.1002/psp4.12169, 2017.

Team 6: How do robots find their way home? Optimizing RFID beacon placement for robot localization and navigation in indoor spaces

  • Mentor Karoline Pershell, Service Robotics & Technologies

While map apps on mobile devices are excellent for getting around town, they are not precise enough to use within buildings. We are currently working on deploying service robots (vacuuming, security, mail delivery) throughout a facility, and the robotic systems will navigate the space based on a pre-made facility map and built-in obstacle avoidance technology. However, a robot still needs to localize itself within the map (i.e., determine where it is on the map) at regular intervals. Using RFID beaconing technology to triangulate position is a promising option for localization. Given a map and RFID readings along a path, can we extrapolate the signal strength to any point in the map. That is, can we develop a model that will allow a robot to localize on a map? How do we optimize the placement (and other variable settings) of beacons to reduce cost but ensure localization? How can we model reduced signals (e.g., beacons in neighboring rooms who signal is coming through a wall), and differentiate between reduced signals and beacons that are far away, acknowledging that signal strength is often variable?

Math-to-Industry Boot Camp

Advisory: 
Application deadline is February 15, 2016

Organizers: 

Description

The Math-to-Industry Boot Camp is an intense six-week session designed to provide graduate students with training and experience that is valuable for employment outside of academia. The program is targeted at Ph.D. students in pure and applied mathematics. The Boot Camp consists of courses in the basics of programming, data analysis, and mathematical modeling. Students will work in teams on projects and will be provided with soft skills training.

There will be two group projects during the session: one "baby project" designed to introduce the concept of solving open-ended problems and on working in teams, and one "capstone project" which will be posed by industry scientists. The students will be able to interact with industry participants at various points in the program. 

Eligibility

Applicants must be current graduate students in a Ph.D. program at a U.S. institution during the period of the boot camp.

Logistics

The program will take place at the IMA on the campus of the University of Minnesota. Students will be housed in a residence hall on campus and will receive a per diem and a travel budget, as well as an $800 stipend.

Applications

To apply, please supply the following materials through the link at the top of the page:

  • Statement of reason for participation, career goals, and relevant experience
  • Unofficial transcript, evidence of good standing, and have full-time status
  • Letter of support from advisor, director of graduate studies, or department chair

Selection criteria will be based on background and statement of interest, as well as geographic and institutional diversity. Women and minorities are especially encouraged to apply. Selected participants will be contacted by March 15.

Participants

Name Department Affiliation
Luis Aguirre Department of Mathematics Texas Christian University
Kirsten Anderson   Kirsten L Anderson, LLC
Niles Armstrong Department of Mathematics Kansas State University
Christopher Bemis   Whitebox Advisors
Jesse Berwald Enterprise Data Analytics & Business Intelligence Target Corporation
Mark Blumstein Department of Mathematics Colorado State University
Marina Brockway   VivaQuant
Anthea Cheung Department of Mathematics and Statistics Boston University
Lise Chlebak Department of Mathematics Tufts University
Kelsey DiPietro Department of Applied Computational Mathematics and Statistics University of Notre Dame
An Do Department of Mathematics Claremont Graduate University
Natalie Durgin Department of Data Science Spiceworks
Jasmine Foo School of Mathematics University of Minnesota, Twin Cities
Richard Frnka Department of Mathematics Louisiana State University
Arezou Ghesmati Department of Mathematics Texas A & M University
John Goes School of Mathematics University of Minnesota, Twin Cities
Rohit Gupta Institute for Mathematics and Its Applications University of Minnesota, Twin Cities
Alex Happ Department of Mathematics University of Kentucky
Mela Hardin Department of Mathematics Arizona State University
Lindsey Hiltner Department of Mathematics University of Minnesota, Twin Cities
Brian Hunter Department of Mathematics Texas A & M University
Ahmet Kabakulak Department of Mathematics University of Wisconsin, Madison
Julienne Kabre   Illinois Institute of Technology
Katherine Kinnaird   Brown University
Avary Kolasinski Department of Mathematics University of Kansas
Shaked Koplewitz Department of Mathematics Yale University
Henry Kvinge Department of Mathematics University of California, Davis
George Lankford Department of Mathematics North Carolina State University
Kevin Leder Department of Industrial System and Engineering University of Minnesota, Twin Cities
Gilad Lerman School of Mathematics University of Minnesota, Twin Cities
Alfonso Limon   Oneirix Labs
Mike Makowesky   MSM Investment Partners
Kristina Martin Department of Mathematics North Carolina State University
Sarah Miracle Department of Computer and Information Sciences University of St. Thomas
Yoichiro Mori School of Mathematics University of Minnesota, Twin Cities
Khanh Nguyen Department of Mathematics University of Houston
Marcella Noorman Department of Mathematics North Carolina State University
Dimitrios Ntogkas Department of Mathematics University of Maryland
Brian Preskitt Department of Mathematics University of California, San Diego
Mrinal Raghupathi USAA Asset Management Company USAA Asset Management Company
Analise Rodenberg Department of Mathematics University of Minnesota, Twin Cities
Keith Rush Department of Mathematics University of Wisconsin, Madison
Nathan Salazar Department of Mathematics The University of Iowa
Fadil Santosa Institute for Mathematics and its Applications University of Minnesota, Twin Cities
Jacob Shapiro Department of Mathematics Purdue University
Timothy Spencer Department of Mathematics Georgia Institute of Technology
Daniel Spirn University of Minnesota University of Minnesota, Twin Cities
Sumanth Swaminathan   Revon Systems
Stan Swierczek Program in Applied Mathematics University of Arizona
Carlos Tolmasky Institute for Mathematics and its Applications University of Minnesota, Twin Cities
Katie Tucker Department of Mathematics University of Nebraska
Joshua Wilson Department of Mathematics University of Minnesota, Twin Cities
Yuhong Yang Department of Statistics University of Minnesota, Twin Cities
Camille Zerfas Department of Mathematical Sciences Clemson University
Ding Zhao Department of Mathematics University of Kentucky

 

Projects and teams

Team 1: Improving Accuracy of ECG Monitoring Using a Wearable Device

  • Mentor Marina Brockway, VivaQuant
  • Lindsey Hiltner, University of Minnesota, Twin Cities
  • Julienne Kabre, Illinois Institute of Technology
  • Dimitrios Ntogkas, University of Maryland
  • Nathan Salazar, The University of Iowa
  • Katie Tucker, University of Nebraska

Remote ECG monitoring through the use of wearable wireless devices is continuing to play an important role in health care by enabling better diagnostics and individualized medicine delivery at a lower cost. As the use of these devices increases, the cost of analyzing the large volume of data produced by these devices becomes more significant. Considerable progress has been made in rendering remote ECGs more resilient to the noise that is commonly encountered with these recordings. However, more work remains in order to achieve the goal of fully automated analysis of ambulatory ECGs. In order to accomplish this goal, high accuracy of beat detection is required despite the presence of noise and signal corruption as well as changes in ECG character due to the presence of cardiac arrhythmia. This project will concentrate on developing a pattern recognition algorithm capable of identifying errors in detection of normal and ectopic ventricular beats. 

Team 2: Human Guided Machine Vision (HGMV)

  • Mentor Alfonso Limon,
  • Kelsey DiPietro, University of Notre Dame
  • Richard Frnka, Louisiana State University
  • Brian Hunter, Texas A & M University
  • Shaked Koplewitz, Yale University
  • Khanh Nguyen, University of Houston
  • Timothy Spencer, Georgia Institute of Technology

In Human Guided Machine Vision (HGMV), humans and machines collaborate to achieve vision tasks that neither can achieve individually. Humans are smart, perceptive and creative, while computers are fast and accurate. Bringing the best qualities of both participants to collaborate on a vision task requires rethinking of human-computer interaction (HCI) paradigms. In HGMV frameworks, we endeavor to produce real time feedback of what will happen if a particular action was taken by the human participant, before the action has been taken. Take a simple example, of detecting a circle with human help. A non-HGMV version of such a software would take in a mouse click from a user and produce a circle that is close to where the mouse was clicked. An HGMV version of the same software would produce fast and accurate feedback of what circle would get detected IF the user were to click where the mouse pointer currently is, without the user actually clicking. This feedback is updated real-time, as the mouse pointer is moved by the user. The lag between the user's mouse movement and the provided feedback should be minimal - ideally, below the perception threshold of 80ms. This requires a rethinking of image processing algorithms. Typically, under HGMV, image processing algorithms are split into a pre- processing step and a real-time step. The pre-processing step should take minimum time, work over the entire image, and produce data (but not too much data) which will allow the real-time step to take in the mouse pointer location and compute the feedback very fast. This technique is most successful, when the pre-processing step exploits some economy of scale due to being processed over the entire image rather than locally.

In the present project, the project team shall undertake development of an algorithm to detect roads consistent with the HGMV paradigm. Fully automated detection of roads from aerial views remains elusive. A computer cannot always tell the difference between a road and various road-like structures. Furthermore, a computer cannot necessarily tell, looking at junctions and interactions, which road went where. A human being has much more understanding of such nuance, which is why humans still digitize road networks for mapping software. The idea of this project is to allow humans to bring their superior understanding to the table, whereas still using the computer to achieve better speed and accuracy of the road digitization task. E.g., finding the center of a road, finding the carriage width, tracing the road (in obvious stretches) can be done much faster by a computer. The project team is tasked with the following tasks:  

  1. Create a script of how a computer and a human would interact to detect roads, while following the HGMV paradigm. Remember that one mainstay of this paradigm is real time feedback of what will happen IF the user clicked.
  2. Create image processing algorithms for human guided road detection.
  3. Split the algorithms into a preprocessing and real-time step.
  4. Create a working demonstration of the proposed algorithms, having real human interactions, real image processing, changeable images, but possibly a very simplistic UI.

Team 3: Mathematical Prediction of Physician Triage of Asthma

  • Mentor Sumanth Swaminathan, Revon Systems
  • Mark Blumstein, Colorado State University
  • Lise Chlebak, Tufts University
  • Henry Kvinge, University of California, Davis
  • Camille Zerfas, Clemson University

Asthma is a lung condition that imposes a significant burden on patients’ daily lives. Escalations of this condition (or exacerbations) are a frequent trigger of physician and hospital visits, which are both costly and distressing to patients. The need for novel solutions that limit the impact of exacerbations on global health is abundantly apparent.

One emerging approach to addressing asthma exacerbation is early detection by way of mobile app technology. Many of these apps, however, utilize rule based decision frameworks, which are constantly hampered by the size of the variable space involved in triage and diagnosis.

We are interested in developing a mathematical model that predicts the appropriate triage (urgency of illness) for an asthma patient based off of patient health characteristics. In particular, we hope to train a machine learning type model on physician generated triage data and use that to make out of sample predictions. Some of our major goals and questions include:

  1. What are the most important patient health features or combination of features for predicting an accurate patient triage?
  2. Why do those particular features or combination of features matter the most?
    1. Can we understand the temporal component of these features (does the temporal change in these features matter or can we make predictions on features at a snapshot of time?)
  3. Why does the particular machine learning model selected perform better than alternatives?
  4. What insights can be drawn from the physician triage data itself? Are there nontrivial trends in physician diagnosis that can be brought to light?
  5. What data visualization techniques best represent the models, and how might you tune the visualization to convey different aspects of the features and functionality?
  6. How do you represent the probability accuracy for the factors that affect the outcome and instill the appropriate level of confidence that the results are trustable and of high quality?
  7. Can you suggest ways of using feedback from incorrect algorithm triage to feedback into the current predictor to improve future performance (retraining protocols, real time retraining, etc)?

Team 4: Universal Identifier

  • Mentor Jesse Berwald, Target Corporation
  • Niles Armstrong, Kansas State University
  • An Do, Claremont Graduate University
  • Arezou Ghesmati, Texas A & M University
  • Alex Happ, University of Kentucky
  • George Lankford, North Carolina State University
  • Ding Zhao, University of Kentucky

Uniquely identifying users or customers is a complex problem when people interact with businesses through multiple channels (store, website, coupon sites, etc.) and through multiple devices (desktop, mobile, phone). It often happens that retailers have multiple records for unique individuals. In trying to provide more personalized context, as well as understand the effectiveness of business strategies, it is useful to have a "Universal ID" which links together all of a user's identities.

To get a rough idea of how this would work, it is easiest to consider a simple example. Consider a small set of identifiers that a user might have:

  • Browser cookie
  • Email address
  • Credit card number (hashed)
  • Login ID

The user can also take certain actions that provide a link between identifiers e.g.:

  • Logging in to a website links a browser cookie to a login ID
  • Making a purchase links a credit card number to an email address, and possibly a login ID
  • Entering an email address in account settings links an email address

The goal is to develop a technique to link all of these IDs together, especially those that don't have an action that provides a direct link (e.g. credit card number to browser cookie). We've approached this using graphs and network theory, but don't let that stop you from using other techniques.

Some additional complications you can add in if time allows:

  • A user may have multiple identities of the same type (cookies for mobile and desktop browser, multiple credit cards, etc.)
  • Some "link" actions provide better information than others. A user can log in to their account from another user's computer, which would provide a false cookie-to-login ID link.
  • The above suggests some sort of probabilistic model. It would be nice to have some sort of score that allows trading off precision for recall depending on the use case.

Team 5: Examining and Resolving Issues found in Mean Variance Optimization

  • Mentor Christopher Bemis, Whitebox Advisors
  • Avary Kolasinski, University of Kansas
  • Kristina Martin, North Carolina State University
  • Brian Preskitt, University of California, San Diego
  • Jacob Shapiro, Purdue University
  • Stan Swierczek, University of Arizona

Mean variance optimization, while providing a foundation for modern portfolio theory, is rife with known issues.  In recent years, focus has been paid to the distribution of eigenvalues of the sample covariance matrix, one of the central parameters of the problem.  Random matrix theory has found some general level of application as a result.

We use market data to examine the pitfalls in the standard theory.  With this groundwork laid, we proceed to examine several remedies, including shrinkage estimators, principal component analysis, and applications arising from random matrix theory.  Our analyses will focus in large part on the effect of each remedy to the spectrum of the input covariance matrix, allowing a common discussion amongst methods, and identifying why exactly the original formulation fails. 

Team 6: Inventory Demand Kaggle Competition

  • Mentor Natalie Durgin, Spiceworks
  • Luis Aguirre, Texas Christian University
  • Anthea Cheung, Boston University
  • Mela Hardin, Arizona State University
  • Ahmet Kabakulak, University of Wisconsin, Madison
  • Marcella Noorman, North Carolina State University

Data and project information can be found on the Kaggle website.

Many internet companies are monetized by serving ads on their websites. Their “inventory” comprises the ad-slots available in front of their users. Their sales department might agree to run an ad campaign for the month, and serve ads into available slots. Ads are often paid for in units of a thousand served. Internet traffic fluctuates. If there are a dearth of slots and a surplus of ads, revenue will be lost. Alternatively, if not enough ads are booked, slots will run empty when money could have been made. Using historical data to predict user traffic and the availability and performance of ad-slots is an important problem, and has an obvious parallel to the Bimbo bakery problem. We will develop a model for the Grupo Bimbo Inventory Demand problem alongside the Kaggle data science community.