Math-to-Industry Boot Camp III
Advisory: Application deadline is February 28, 2018
- Benjamin Brubaker, University of Minnesota, Twin Cities
- Fadil Santosa, University of Minnesota, Twin Cities
- Daniel Spirn, University of Minnesota, Twin Cities
The Math-to-Industry Boot Camp is an intense six-week session designed to provide graduate students with training and experience that is valuable for employment outside of academia. The program is targeted at Ph.D. students in pure and applied mathematics. The boot camp consists of courses in the basics of programming, data analysis, and mathematical modeling. Students work in teams on projects and are provided with training in resume and interview preparation as well as teamwork.
There are two group projects during the session: a small-scale project designed to introduce the concept of solving open-ended problems and working in teams, and a "capstone project" that is posed by industrial scientists.
Weekly seminars by industrial scientists provide the students with opportunities to learn about a variety of possible future careers.
Applicants must be current graduate students in a Ph.D. program at a U.S. institution during the period of the boot camp.
The program will take place at the IMA on the campus of the University of Minnesota. Students will be housed in a residence hall on campus and will receive a per diem and a travel budget, as well as an $800 stipend.
To apply, please supply the following materials through the link at the top of the page:
- Statement of reason for participation, career goals, and relevant experience
- Unofficial transcript, evidence of good standing, and have full-time status
- Letter of support from advisor, director of graduate studies, or department chair
Selection criteria will be based on background and statement of interest, as well as geographic and institutional diversity. Women and minorities are especially encouraged to apply. Selected participants will be contacted in April.
|Christopher Bemis||Whitebox Advisors|
|Nitsan Ben-Gal||Software, Electronics and Mechanical Systems Laboratory||3M|
|Jesse Berwald||D-Wave Systems|
|Ariel Bowman||Department of Mathematics||University of Texas at Arlington|
|Chris Browne||Center for Applied Mathematics||Cornell University|
|Benjamin Brubaker||School of Mathematics||University of Minnesota, Twin Cities|
|Kate Brubaker||Department of Mathematics||Purdue University|
|Irfan Bulu||Department of Math and Modeling||Schlumberger-Doll Research|
|Shawn Burkett||Mathematics||University of Colorado|
|Olivia Cannon||Department of Mathematics||University of Minnesota, Twin Cities|
|Jared Catenacci||Diagnostic Research and Material Studies||National Security Technologies, LLC|
|Chirasree Chatterjee||Department of Mathematics and Statistics||Saint Louis University|
|Hua Chen||Department of Mathematical Sciences||University of Delaware|
|Aaron Cohen||Department of Mathematics||Indiana University|
|Mingchang Ding||Department of Mathematical Sciences||University of Delaware|
|Jasmine Foo||School of Mathematics||University of Minnesota, Twin Cities|
|Zhen Gao||Department of Mathematics||Vanderbilt University|
|Maria Gommel||Department of Mathematics||The University of Iowa|
|Hayley Guy||School of Mathematics||North Carolina State University|
|Qie He||Department of Industrial and Systems Engineering||University of Minnesota, Twin Cities|
|Thomas Hoft||Department of Mathematics||University of St. Thomas|
|Ruihao Huang||Department of Mathematical Sciences||Michigan Technological University|
|Jeffrey Humpherys||UnitedHealth Group|
|Laura Iosip||Department of Mathematics||University of Maryland|
|Melanie Jensen||Department of Mathematics||Tulane University|
|Alicia Johnson||Macalester College|
|Ekaterina Kryuchkova||Center for Applied Mathematics||Cornell University|
|Kevin Leder||Department of Industrial System and Engineering||University of Minnesota, Twin Cities|
|Philku Lee||Department of Mathematics and Statistics||Mississippi State University|
|SangJoon Lee||Department of Mathematics||University of Connecticut|
|Hengguang Li||Department of Mathematics||Wayne State University|
|Aaron Luttman||Diagnostic Research and Material Studies||National Security Technologies, LLC|
|Christopher Miller||School of Mathematics||University of California, Berkeley|
|Cristian Minoccheri||Department of Mathematics||State University of New York, Stony Brook (SUNY)|
|Sarah Miracle||Department of Computer and Information Sciences||University of St. Thomas|
|Shannon Negaard-Paper||University of Minnesota, Twin Cities|
|Elpiniki Nikolopoulou||Department of Applied Mathematics and Statistics||Arizona State University|
|Michelle Pinharry||School of Mathematics||University of Minnesota, Twin Cities|
|Iurii Posukhovskyi||Department of Mathematics||University of Kansas|
|Mrinal Raghupathi||USAA Asset Management Company||USAA Asset Management Company|
|Michael Ramsey||Department of Applied Mathematics||University of Colorado|
|Eric Roberts||Department of Applied Mathematics||University of California, Merced|
|Tanushree Roy||School of Mathematics||University of Central Florida|
|Keith Rush||Department of Strategy and Analytics||Milwaukee Brewers|
|Fadil Santosa||School of Mathematics||University of Minnesota, Twin Cities|
|Chang Shu||Department of Applied Mathematics||University of California, Davis|
|Dallas Smith||School of Mathematics||Brigham Young University|
|Daniel Spirn||University of Minnesota||University of Minnesota, Twin Cities|
|Binh Tang||Department of Statistical Science||Cornell University|
|Elizabeth Wicks||School of Mathematics||University of Washington|
|Shiqiang Xia||University of Minnesota, Twin Cities|
|Yufei Yu||Department of Mathematics||University of Kansas|
|Sheng Zhang||Department of Mathematics||Purdue University|
Projects and teams
Team 1: Mathematical Models for Adaptive Multi-modal Sensing
- Mentor Aaron Luttman, National Security Technologies, LLC
- Mentor Jared Catenacci, National Security Technologies, LLC
- Ariel Bowman, University of Texas at Arlington
- Shawn Burkett, University of Colorado
- Hayley Guy, North Carolina State University
- Laura Iosip, University of Maryland
- Yufei Yu, University of Kansas
- Sheng Zhang, Purdue University
Scientific experiments are a natural source of data – which usually means diagnostic systems fielded to collect information within the experiments themselves – but there has been a recent trend towards collecting data around big science experiments to understand if we can detect and characterize the behaviors associated with the experiments. The question is whether it is possible to determine what experiments are being conducted by analyzing human patterns, so-call “patterns of life,” around and in the experimental facilities. In order to measure patterns of life, we analyze many different types of data, from power grid load profiles to internet activity to sound and pressure signals from cars.
There are two primary challenges that must be addressed:
Mathematical Models for Adaptive Sensing – When should a sensor system turn on its sensors and transmit its data, given that these two activities take a lot of power?
Physics-based Multi-modal Feature Selection and Detection – How can one incorporate physics models for sensing into machine learning approaches to data analysis?
Real multi-sensor data will be provided for testing and validation.
Team 2: Quantum Computation and QUBO Slicing
- Mentor Jesse Berwald, D-Wave Systems
- Olivia Cannon, University of Minnesota, Twin Cities
- Tanushree Roy, University of Central Florida
- Chang Shu, University of California, Davis
- Dallas Smith, Brigham Young University
- Elizabeth Wicks, University of Washington
Quantum annealing computers have begun to enter the business and academic worlds. Over the past five years they have been used for a wide variety of (prototypical) applications, with evidence of differentiated performance in some cases.
A first step in utilizing these computers is to reformulate the problem in an energy minimization framework. This is typically cast as a Hamiltonian, or alternatively as a quadratic unconstrained binary optimization (QUBO), which can be represented as a matrix. These formulations are translated to the physical qubits on the quantum processing unit (QPU) through a process termed “embedding”. Embedding a given problem onto the QPU is handled through a number of different heuristics and is an active area of research in itself, one of which is described below.
In this project we will investigate one proposed solution to the embedding problem:
The goal is to make the most efficient use of the qubit hardware by developing a parameterized transformation from the space spanned by physical qubits, “qubit space”, to the space spanned by problem variables, the “problem search space”. Our goal will be to define a linear transformation from qubit space to problem search space that allows for a more efficient use of available hardware.
Since the problem space is (in general) much larger than the qubit space, a fixed parameterization will succeed in mapping the qubit space into an proper subspace of the problem space. We term these subspaces “slices”. This reduced problem can then be solved with an optimal use of the available hardware. Using different parameterizations, we can define a series of linear transformations onto orthogonal subspaces of the problem space.
There are many parameterizations to choose from, each of which raises a number of research questions. We will prioritize our investigation roughly as follows:
- Given a QUBO matrix defining the problem search space, is there an algorithm that produces the most efficient set of transformations (parameterizations) from qubit space to problem space?
- Is there a greedy algorithm that is best in practice — i.e. choose a slice that maximizes the use of the chip, and then choose successively smaller slices to query the entire search space.
- What is the role of sparsity in the choice of transformations?
- The QPU itself has a unique architecture. How does this architecture affect the choice of transformations?
- Traffic flow optimization using a quantum annealer: https://arxiv.org/pdf/1708.01625.pdf
- A NASA Perspective on Quantum Computing: Opportunities and Challenges: https://arxiv.org/pdf/1704.04836.pdf
Team 3: Time Series Analysis of Gas Mixture Data
- Mentor Nicholas Asendorf, 3M
- Kate Brubaker, Purdue University
- Ruihao Huang, Michigan Technological University
- Philku Lee, Mississippi State University
- Elpiniki Nikolopoulou, Arizona State University
- Michelle Pinharry, University of Minnesota, Twin Cities
Sensor networks are ubiquitous in today’s Internet of Things, capable of collecting high frequency data in a cost efficient way. This results in mountains of time-series data that hopefully contain signals of interest buried in noise. As the number of deployed sensors grows, so does the dimensionality of the observed data, further increasing the complexity of the problem. 3M is interested in such large scale time series analyses because many of our datasets can be framed in this way: manufacturing, sales, and chemical experiments to name a few.
This publicly available dataset contains time series sensor readings from chemical sensors over the duration of 12 hours. The input to these sensors are known concentrations of various gases. The dataset contains timestamped measurements from 16 gas sensors and the input concentrations of the gases. This is a labeled time series dataset. There are two different gas mixture measurement files, one for Ethylene and CO, and one for Ethylene and Methane. At 3M, we may have similar types of experimental data (perhaps using different sensors) where we would like to determine the interactions between materials or understand fundamental properties of materials. Being able to intelligently and efficiently mine these rich datasets for insights about material characteristics is critical.
Some interesting problems to consider:
- Develop an algorithm to estimate the concentration of each gas given sensor measurements. You might approach this problem using classical machine learning, splitting data into training, validation, and testing, while treating time series measurements as independent points.
- Develop algorithms to estimate the concentrations of each gas using time series based methods like windowing, tsfresh, or RNNs. In this approach, we don’t want to treat each measurement as independent. How do these algorithms compare to classical machine learning techniques?
- Can you use the fact that we have 4 replicates of each sensor at each time point to improve your algorithms? Can you use any clever data fusion techniques or outlier detection strategies?
- What can you tell about the importance or accuracy of the 4 types of sensors used?
- What happens when we purposely introduce missing data? Can we use the replicates of each sensor to overcome this? How robust are your algorithms to missing data?
- Since each dataset has measurements for Ethylene, can we use both datasets to develop a more robust estimation scheme for that gas?
Team 4: Structured Variational Auto Encoders
- Mentor Irfan Bulu, Schlumberger-Doll Research
- Hua Chen, University of Delaware
- Aaron Cohen, Indiana University
- Mingchang Ding, University of Delaware
- Melanie Jensen, Tulane University
- Christopher Miller, University of California, Berkeley
- Michael Ramsey, University of Colorado
Generative models such as Variational Auto Encoders (VAE), Generative Adversarial Networks(GAN) have been very successful in unsupervised learning settings. In a VAE setting, we would like to learn a set of latent variables that explain our data. Although, this has been very successful as a generative model, the interpretation of latent variables is still a challenge. Ideally, what we would like to do is unsupervised learning through which we identify a number of classes (not specified yet). Once a set of classes has been identified, we can then label once instead of having to label the entire data set. Imagine you have a sample of handwritten digits without labels. If we can structure VAE in a way that it can identify 10 classes, we can then go label these classes as the relevant digits. This would be very helpful as most of our data is unlabeled or poorly labeled.
Concepts that may be helpful to know: neural network, generative models, graphical models, stochastic variational inference.
Team 5: Tailored Discovery in Stock Portfolios
- Mentor Christopher Bemis, Whitebox Advisors
- Chirasree Chatterjee, Saint Louis University
- Zhen Gao, Vanderbilt University
- Cristian Minoccheri, State University of New York, Stony Brook (SUNY)
- Shannon Negaard-Paper, University of Minnesota, Twin Cities
- Shiqiang Xia, University of Minnesota, Twin Cities
Modern portfolio theory has provided tools to identify systemic and idiosyncratic risks via models like Markowitz' Mean-Variance Optimization. In addition, a taxonomy of equities has emerged through feature identification, with one of the earliest and most impactful being Fama and French's three factor model.
In this project, we will leverage technical and fundamental data like return series and earnings information along with well understood equity features like exposure to so-called size, value, and market portfolios to develop tools for suggesting supplements (e.g., technology stocks when looking at Apple) and complements (e.g., energy stocks when looking at Delta Airlines) for individual equities and portfolios. These tools may be used in tailored discovery and research by analysts looking to either construct a portfolio based on a theme or to diversify. The work will ideally evolve from point estimates using simple norms in a predetermined feature space to applying machine learning techniques.
Data will be supplied from Quandl, and the preferred language for development will be Python.
Team 6: Sequence-to-sequence modeling for the business of baseball
- Mentor Keith Rush, Milwaukee Brewers
- Maria Gommel, The University of Iowa
- Ekaterina Kryuchkova, Cornell University
- SangJoon Lee, University of Connecticut
- Iurii Posukhovskyi, University of Kansas
- Eric Roberts, University of California, Merced
Each fan has a unique relationship to his or her favorite sports teams, and each has a different ideal every time they step into the stadium. When a team makes a big free-agent signing in February, the fan who follows he competition closely will be ecstatic--the fan who primarily enjoys the communal aspects will only see this effect in the buzz generated in his or her social circles. In order to cherish their fans to the utmost, teams must have a global view of their business and be able to structure data from all sources and across all levels of granularity, creating one universe into which all inputs and from which all outputs feed.
This project is fundamentally a first step in that direction. The problem we are focusing on is roughly the following: conditioned on a vector representing a fan's history with the Club and the attributes of a particular game, how well can we ingest information in time and map it forward one time step. For this purpose, we will test the standard recurrent and convolutional network architectures, as well as experimenting with variants and discussing the reasons for applying each and their limitations. Data will be provided from the Brewers and the development will take place in Python, utilizing cloud infrastructure for the computing power.