Math-to-Industry Boot Camp

Advisory: 
Application deadline is February 15, 2016

Organizers: 

Description

The Math-to-Industry Boot Camp is an intense six-week session designed to provide graduate students with training and experience that is valuable for employment outside of academia. The program is targeted at Ph.D. students in pure and applied mathematics. The Boot Camp consists of courses in the basics of programming, data analysis, and mathematical modeling. Students will work in teams on projects and will be provided with soft skills training.

There will be two group projects during the session: one "baby project" designed to introduce the concept of solving open-ended problems and on working in teams, and one "capstone project" which will be posed by industry scientists. The students will be able to interact with industry participants at various points in the program. 

Eligibility

Applicants must be current graduate students in a Ph.D. program at a U.S. institution during the period of the boot camp.

Logistics

The program will take place at the IMA on the campus of the University of Minnesota. Students will be housed in a residence hall on campus and will receive a per diem and a travel budget, as well as an $800 stipend.

Applications

To apply, please supply the following materials through the link at the top of the page:

  • Statement of reason for participation, career goals, and relevant experience
  • Unofficial transcript, evidence of good standing, and have full-time status
  • Letter of support from advisor, director of graduate studies, or department chair

Selection criteria will be based on background and statement of interest, as well as geographic and institutional diversity. Women and minorities are especially encouraged to apply. Selected participants will be contacted by March 15.

Participants

NameDepartmentAffiliation
Luis AguirreDepartment of MathematicsTexas Christian University
Kirsten Anderson Kirsten L Anderson, LLC
Niles ArmstrongDepartment of MathematicsKansas State University
Christopher Bemis Whitebox Advisors
Jesse BerwaldEnterprise Data Analytics & Business IntelligenceTarget Corporation
Mark BlumsteinDepartment of MathematicsColorado State University
Marina Brockway VivaQuant
Anthea CheungDepartment of Mathematics and StatisticsBoston University
Lise ChlebakDepartment of MathematicsTufts University
Kelsey DiPietroDepartment of Applied Computational Mathematics and StatisticsUniversity of Notre Dame
An DoDepartment of MathematicsClaremont Graduate University
Natalie DurginDepartment of Data ScienceSpiceworks
Jasmine FooSchool of MathematicsUniversity of Minnesota, Twin Cities
Richard FrnkaDepartment of MathematicsLouisiana State University
Arezou GhesmatiDepartment of MathematicsTexas A & M University
John GoesSchool of MathematicsUniversity of Minnesota, Twin Cities
Rohit GuptaInstitute for Mathematics and Its ApplicationsUniversity of Minnesota, Twin Cities
Alex HappDepartment of MathematicsUniversity of Kentucky
Mela HardinDepartment of MathematicsArizona State University
Lindsey HiltnerDepartment of MathematicsUniversity of Minnesota, Twin Cities
Brian HunterDepartment of MathematicsTexas A & M University
Ahmet KabakulakDepartment of MathematicsUniversity of Wisconsin, Madison
Julienne Kabre Illinois Institute of Technology
Katherine Kinnaird Brown University
Avary KolasinskiDepartment of MathematicsUniversity of Kansas
Shaked KoplewitzDepartment of MathematicsYale University
Henry KvingeDepartment of MathematicsUniversity of California, Davis
George LankfordDepartment of MathematicsNorth Carolina State University
Kevin LederDepartment of Industrial System and EngineeringUniversity of Minnesota, Twin Cities
Gilad LermanSchool of MathematicsUniversity of Minnesota, Twin Cities
Alfonso Limon Oneirix Labs
Mike Makowesky MSM Investment Partners
Kristina MartinDepartment of MathematicsNorth Carolina State University
Sarah MiracleDepartment of Computer and Information SciencesUniversity of St. Thomas
Yoichiro MoriSchool of MathematicsUniversity of Minnesota, Twin Cities
Khanh NguyenDepartment of MathematicsUniversity of Houston
Marcella NoormanDepartment of MathematicsNorth Carolina State University
Dimitrios NtogkasDepartment of MathematicsUniversity of Maryland
Brian PreskittDepartment of MathematicsUniversity of California, San Diego
Mrinal RaghupathiUSAA Asset Management CompanyUSAA Asset Management Company
Analise RodenbergDepartment of MathematicsUniversity of Minnesota, Twin Cities
Keith RushDepartment of MathematicsUniversity of Wisconsin, Madison
Nathan SalazarDepartment of MathematicsThe University of Iowa
Fadil SantosaInstitute for Mathematics and its ApplicationsUniversity of Minnesota, Twin Cities
Jacob ShapiroDepartment of MathematicsPurdue University
Timothy SpencerDepartment of MathematicsGeorgia Institute of Technology
Daniel SpirnUniversity of MinnesotaUniversity of Minnesota, Twin Cities
Sumanth Swaminathan Revon Systems
Stan SwierczekProgram in Applied MathematicsUniversity of Arizona
Carlos TolmaskyInstitute for Mathematics and its ApplicationsUniversity of Minnesota, Twin Cities
Katie TuckerDepartment of MathematicsUniversity of Nebraska
Joshua WilsonDepartment of MathematicsUniversity of Minnesota, Twin Cities
Yuhong YangDepartment of StatisticsUniversity of Minnesota, Twin Cities
Camille ZerfasDepartment of Mathematical SciencesClemson University
Ding ZhaoDepartment of MathematicsUniversity of Kentucky

 

Projects and teams

Team 1: Improving Accuracy of ECG Monitoring Using a Wearable Device

  • Mentor Marina Brockway, VivaQuant
  • Lindsey Hiltner, University of Minnesota, Twin Cities
  • Julienne Kabre, Illinois Institute of Technology
  • Dimitrios Ntogkas, University of Maryland
  • Nathan Salazar, The University of Iowa
  • Katie Tucker, University of Nebraska

Remote ECG monitoring through the use of wearable wireless devices is continuing to play an important role in health care by enabling better diagnostics and individualized medicine delivery at a lower cost. As the use of these devices increases, the cost of analyzing the large volume of data produced by these devices becomes more significant. Considerable progress has been made in rendering remote ECGs more resilient to the noise that is commonly encountered with these recordings. However, more work remains in order to achieve the goal of fully automated analysis of ambulatory ECGs. In order to accomplish this goal, high accuracy of beat detection is required despite the presence of noise and signal corruption as well as changes in ECG character due to the presence of cardiac arrhythmia. This project will concentrate on developing a pattern recognition algorithm capable of identifying errors in detection of normal and ectopic ventricular beats. 

Team 2: Human Guided Machine Vision (HGMV)

  • Mentor Alfonso Limon,
  • Kelsey DiPietro, University of Notre Dame
  • Richard Frnka, Louisiana State University
  • Brian Hunter, Texas A & M University
  • Shaked Koplewitz, Yale University
  • Khanh Nguyen, University of Houston
  • Timothy Spencer, Georgia Institute of Technology

In Human Guided Machine Vision (HGMV), humans and machines collaborate to achieve vision tasks that neither can achieve individually. Humans are smart, perceptive and creative, while computers are fast and accurate. Bringing the best qualities of both participants to collaborate on a vision task requires rethinking of human-computer interaction (HCI) paradigms. In HGMV frameworks, we endeavor to produce real time feedback of what will happen if a particular action was taken by the human participant, before the action has been taken. Take a simple example, of detecting a circle with human help. A non-HGMV version of such a software would take in a mouse click from a user and produce a circle that is close to where the mouse was clicked. An HGMV version of the same software would produce fast and accurate feedback of what circle would get detected IF the user were to click where the mouse pointer currently is, without the user actually clicking. This feedback is updated real-time, as the mouse pointer is moved by the user. The lag between the user's mouse movement and the provided feedback should be minimal - ideally, below the perception threshold of 80ms. This requires a rethinking of image processing algorithms. Typically, under HGMV, image processing algorithms are split into a pre- processing step and a real-time step. The pre-processing step should take minimum time, work over the entire image, and produce data (but not too much data) which will allow the real-time step to take in the mouse pointer location and compute the feedback very fast. This technique is most successful, when the pre-processing step exploits some economy of scale due to being processed over the entire image rather than locally.

In the present project, the project team shall undertake development of an algorithm to detect roads consistent with the HGMV paradigm. Fully automated detection of roads from aerial views remains elusive. A computer cannot always tell the difference between a road and various road-like structures. Furthermore, a computer cannot necessarily tell, looking at junctions and interactions, which road went where. A human being has much more understanding of such nuance, which is why humans still digitize road networks for mapping software. The idea of this project is to allow humans to bring their superior understanding to the table, whereas still using the computer to achieve better speed and accuracy of the road digitization task. E.g., finding the center of a road, finding the carriage width, tracing the road (in obvious stretches) can be done much faster by a computer. The project team is tasked with the following tasks:  

  1. Create a script of how a computer and a human would interact to detect roads, while following the HGMV paradigm. Remember that one mainstay of this paradigm is real time feedback of what will happen IF the user clicked.
  2. Create image processing algorithms for human guided road detection.
  3. Split the algorithms into a preprocessing and real-time step.
  4. Create a working demonstration of the proposed algorithms, having real human interactions, real image processing, changeable images, but possibly a very simplistic UI.

Team 3: Mathematical Prediction of Physician Triage of Asthma

  • Mentor Sumanth Swaminathan, Revon Systems
  • Mark Blumstein, Colorado State University
  • Lise Chlebak, Tufts University
  • Henry Kvinge, University of California, Davis
  • Camille Zerfas, Clemson University

Asthma is a lung condition that imposes a significant burden on patients’ daily lives. Escalations of this condition (or exacerbations) are a frequent trigger of physician and hospital visits, which are both costly and distressing to patients. The need for novel solutions that limit the impact of exacerbations on global health is abundantly apparent.

One emerging approach to addressing asthma exacerbation is early detection by way of mobile app technology. Many of these apps, however, utilize rule based decision frameworks, which are constantly hampered by the size of the variable space involved in triage and diagnosis.

We are interested in developing a mathematical model that predicts the appropriate triage (urgency of illness) for an asthma patient based off of patient health characteristics. In particular, we hope to train a machine learning type model on physician generated triage data and use that to make out of sample predictions. Some of our major goals and questions include:

  1. What are the most important patient health features or combination of features for predicting an accurate patient triage?
  2. Why do those particular features or combination of features matter the most?
    1. Can we understand the temporal component of these features (does the temporal change in these features matter or can we make predictions on features at a snapshot of time?)
  3. Why does the particular machine learning model selected perform better than alternatives?
  4. What insights can be drawn from the physician triage data itself? Are there nontrivial trends in physician diagnosis that can be brought to light?
  5. What data visualization techniques best represent the models, and how might you tune the visualization to convey different aspects of the features and functionality?
  6. How do you represent the probability accuracy for the factors that affect the outcome and instill the appropriate level of confidence that the results are trustable and of high quality?
  7. Can you suggest ways of using feedback from incorrect algorithm triage to feedback into the current predictor to improve future performance (retraining protocols, real time retraining, etc)?

Team 4: Universal Identifier

  • Mentor Jesse Berwald, Target Corporation
  • Niles Armstrong, Kansas State University
  • An Do, Claremont Graduate University
  • Arezou Ghesmati, Texas A & M University
  • Alex Happ, University of Kentucky
  • George Lankford, North Carolina State University
  • Ding Zhao, University of Kentucky

Uniquely identifying users or customers is a complex problem when people interact with businesses through multiple channels (store, website, coupon sites, etc.) and through multiple devices (desktop, mobile, phone). It often happens that retailers have multiple records for unique individuals. In trying to provide more personalized context, as well as understand the effectiveness of business strategies, it is useful to have a "Universal ID" which links together all of a user's identities.

To get a rough idea of how this would work, it is easiest to consider a simple example. Consider a small set of identifiers that a user might have:

  • Browser cookie
  • Email address
  • Credit card number (hashed)
  • Login ID

The user can also take certain actions that provide a link between identifiers e.g.:

  • Logging in to a website links a browser cookie to a login ID
  • Making a purchase links a credit card number to an email address, and possibly a login ID
  • Entering an email address in account settings links an email address

The goal is to develop a technique to link all of these IDs together, especially those that don't have an action that provides a direct link (e.g. credit card number to browser cookie). We've approached this using graphs and network theory, but don't let that stop you from using other techniques.

Some additional complications you can add in if time allows:

  • A user may have multiple identities of the same type (cookies for mobile and desktop browser, multiple credit cards, etc.)
  • Some "link" actions provide better information than others. A user can log in to their account from another user's computer, which would provide a false cookie-to-login ID link.
  • The above suggests some sort of probabilistic model. It would be nice to have some sort of score that allows trading off precision for recall depending on the use case.

Team 5: Examining and Resolving Issues found in Mean Variance Optimization

  • Mentor Christopher Bemis, Whitebox Advisors
  • Avary Kolasinski, University of Kansas
  • Kristina Martin, North Carolina State University
  • Brian Preskitt, University of California, San Diego
  • Jacob Shapiro, Purdue University
  • Stan Swierczek, University of Arizona

Mean variance optimization, while providing a foundation for modern portfolio theory, is rife with known issues.  In recent years, focus has been paid to the distribution of eigenvalues of the sample covariance matrix, one of the central parameters of the problem.  Random matrix theory has found some general level of application as a result.

We use market data to examine the pitfalls in the standard theory.  With this groundwork laid, we proceed to examine several remedies, including shrinkage estimators, principal component analysis, and applications arising from random matrix theory.  Our analyses will focus in large part on the effect of each remedy to the spectrum of the input covariance matrix, allowing a common discussion amongst methods, and identifying why exactly the original formulation fails. 

Team 6: Inventory Demand Kaggle Competition

  • Mentor Natalie Durgin, Spiceworks
  • Luis Aguirre, Texas Christian University
  • Anthea Cheung, Boston University
  • Mela Hardin, Arizona State University
  • Ahmet Kabakulak, University of Wisconsin, Madison
  • Marcella Noorman, North Carolina State University

Data and project information can be found on the Kaggle website.

Many internet companies are monetized by serving ads on their websites. Their “inventory” comprises the ad-slots available in front of their users. Their sales department might agree to run an ad campaign for the month, and serve ads into available slots. Ads are often paid for in units of a thousand served. Internet traffic fluctuates. If there are a dearth of slots and a surplus of ads, revenue will be lost. Alternatively, if not enough ads are booked, slots will run empty when money could have been made. Using historical data to predict user traffic and the availability and performance of ad-slots is an important problem, and has an obvious parallel to the Bimbo bakery problem. We will develop a model for the Grupo Bimbo Inventory Demand problem alongside the Kaggle data science community.

Start date
Monday, June 20, 2016, 8 a.m.
End date
Friday, July 29, 2016, 5 p.m.
Location

IMA, University of Minnesota

Share