# M.S. curriculum

The Data Science M.S. is a plan B track program with a capstone project culminating in a final written report and oral presentation.

To satisfy all program requirements, admitted students must complete the following courses:

Statistics track credits | 6 credits |

Algorithmics track credits | 6 credits |

Infrastructure and large scale computing track credits | 6 credits |

Elective credits (at least 3 credits of the 9 credits must be 8000 level) | 9 credits |

Capstone credits (off-campus research must be approved by the Graduate Committee) | 3 credits |

Colloquium creditsOne credit of the Data Science Colloquium (or equivalent in a participating department) is mandatory and must appear on the student’s graduate degree plan form. |
1 credit |

Total credits for the degree |
31 credits |

Minimum course credits that must be taken at the University of Minnesota once admitted to the program |
20 credits |

See the sample graduate program outline for semester planning assistance.

Please note:

- All courses taken from a department affiliated with the Data Science program must be taken A-F if available.
- All credits listed on the Graduate Degree Plan must be 5000 level or above, with a GPA of at least 3.25. You must maintain an overall GPA of 3.0 while a graduate student in this program.
- This program may be completed with a minor.
- Use of 4xxx courses towards program requirements is not permitted (except as an elective by special petition).
- It is acceptable to take only 6 credits of electives and carry out a 6 credit capstone project spread over two semesters if your project advisor agrees on the scope of your project. This "6-6" plan is the one generally followed by students admitted before 2017. All other students will follow the "9-3" plan unless they explicitly opt for the "6-6" plan with their advisor's concurrence.

## Courses

### Statistics

**Take two courses (totaling 6-8 credits) from the following list of courses, at least one of which is a Tier I course:**

#### Tier I courses

- STAT 5101 - Theory of Statistics I / MATH 5651 - Basic Theory of Probability and Statistics (STAT 5101 and MATH 5651 are equivalencies)
- STAT 5102 - Theory of Statistics II
- STAT 5302 - Applied Regression Analysis
- STAT 5401 - Applied Multivariate Methods
- STAT 5511 - Time Series Analysis
- STAT 8051 - Applied Statistical Methods 1: Computing and Generalized Linear Models
- STAT 8101 - Theory of Statistics I
- STAT 8102 - Theory of Statistics II
- PUBH 7401 - Fundamentals of Biostatistical Inference
- PUBH 7402 - Biostatistics Modeling and Methods
- PUBH 7440 - Introduction to Bayesian Analysis

#### Tier II courses

- AST/STAT 5731 - Bayesian Astrostatistics
- PUBH 7405 - Biostatistics Regression
- PUBH 7406 - Advanced Regression and Design
- PUBH 7407 - Analysis of Categorical Data
- PUBH 7430 - Statistical Methods for Correlated Data
- PUBH 7460 - Advanced Statistical Computing
- PUBH 7485 - Methods for Causal Inference
- PUBH 8401 - Linear Models
- PUBH 8432 - Probability Models for Biostatistics
- PUBH 8442 - Bayesian Decision Theory
- STAT 5052 - Statistical and Machine Learning
- STAT 5201 - Sampling Methodology in Finite Populations
- STAT 5303 - Designing Experiments
- STAT 5421 - Analysis of Categorical Data
- STAT 5601 - Nonparametric Methods
- STAT 5701 - Statistical Computing
- STAT 8112 - Mathematical Statistics II
- EE 5531 Probability and Stochastic Processes
- EE 8581 - Detection and Estimation Theory
- Any course from a list of STAT/Biostat 5xxx/8xxx classes (but not STAT 5021) with advisor and DGS approval

### Algorithmics

**Take two courses (totaling 6 credits) from the following list of courses, at least one of which is a Tier I course:**

#### Tier I courses

- CSCI 5521 - Introduction to Machine Learning (formerly Pattern Recognition)
- CSCI 5523 - Introduction to Data Mining
- CSCI 5525 - Machine Learning
- EE 8591 - Predictive Learning from Data
- PUBH 7475 - Statistical Learning and Data Mining
- PUBH 8475/STAT 8056 - Statistical Learning and Data Mining

#### Tier II courses

- CSCI 5302 - Analysis of Numerical Algorithms
- CSCI 5304 - Computational Aspects of Matrix Theory
- CSCI 5511 - Artificial Intelligence I
- CSCI 5512 - Artificial Intelligence II
- CSCI 5527 - Deep Learning: Models, Computation, and Applications
- CSCI 5609 - Visualization (renumbered from CSCI 5109)
- CSCI 8314 - Sparse Matrix Computations
- CSCI 8581 - Big Data in Astrophysics
- EE 5239 - Introduction to Nonlinear Optimization
- EE 5251 - Optimal Filtering and Estimation
- EE 5389 - Introduction to Predictive Learning
- EE 5391 - Computing With Neural Networks
- EE 5542 - Adaptive Digital Signal Processing
- EE 5551 - Multiscale and Multirate Signal Processing
- EE 5561 - Image Processing and Applications
- EE 5581 - Information Theory and Coding
- EE 5585 - Data Compression
- EE 8231 - Optimization Theory
- IE 5531 - Engineering Optimization I
- IE 8521 - Optimization
- IE 8531 - Discrete Optimization
- Any advanced class in optimization, game theory, or topic related to the listed Algorithmics courses (with advisor and DGS approval)

### Infrastructure and Large Scale Computing

**Take two courses (totaling 6 credits) from the following list of courses, at least one of which is a Tier I course:**

#### Tier I courses

- CSCI 5105 - Introduction to Distributed Systems
- CSCI 5451 - Introduction to Parallel Computing: Architectures, Algorithms, and Programming
- CSCI 5707 - Principles of Database Systems
- CSCI 5708 - Architecture and Implementation of Database Management Systems
- EE 5351 - Applied Parallel Programming
- EE 8367/CSCI 8205 - Parallel Computer Organization

#### Tier II courses

- CSCI 5103 - Operating Systems
- CSCI 5211 - Data Communications and Computer Networks
- CSCI 5231 - Wireless and Sensor Networks
- CSCI 5271 - Introduction to Computer Security
- CSCI 5715 - From GPS and Virtual Globes to Spatial Computing
- CSCI 5751 - Big Data Engineering and Architecture
- CSCI 5801 - Software Engineering I
- CSCI 5802 - Software Engineering II
- CSCI 8102 - Foundations of Distributed Computing
- CSCI 8701 - Overview of Database Research
- CSCI 8715 - Spatial Databases and Applications
- CSCI 8725 - Databases for Bioinformatics
- CSCI 8735 - Advanced Database Systems
- CSCI 8801 - Advanced Software Engineering
- EE 5355 - Algorithmic Techniques for Scalable Many-core Computing
- EE 5371 - Computer Systems Performance Measurement and Evaluation
- EE 5381 - Telecommunications Networks
- EE 5501 - Digital Communication
- Any advanced class in large-scale data management or analysis, or topic related to the listed Infrastructure courses (with advisor and DGS approval)

### Special topics courses

- CSCI 8980 - Topic: Cloud Computing/Big Data -
**Infrastructure and Large Scale Computing Tier II** - CSCI 5980 - Topic: Big Data Engineering and Analytics (now CSci 5751) -
**Infrastructure and Large Scale Computing Tier II** - CSCI 8980 - Topic: Advanced Topics in Distributed Systems -
**Infrastructure and Large Scale Computing Tier II** - CSCI 5980/8980 - Topic: Think Deep Learning -
**Elective** - IE 8534 - Topic: Modern Nonconvex Nondifferentiable Optimization with Applications in Statistical Learning -
**Elective** - IE 8534 - Topic:
**Elective** - IE 8534 - Topic: Multilevel Monte Carlo for Problems in Data Science -
**Elective**

### Electives

**Take three courses (totaling 9 credits), of which 3 credits must be an 8xxx level course if you complete a one-semester Capstone project, or 6 credits of electives if you complete a two-semester Capstone project. Students who complete a two-semester capstone course (which means registering for the capstone course in two semesters) do not need to take an 8xxx level course. **

Suggestions and pre-approved options are listed below. This is a non-exclusive list. An elective course is a course that explores more deeply concepts or methodologies addressed in a regular track course listed above, or a course that addresses tools or methodologies needed to make the methods above work, or a course in an application area outside data science in which issues of data management, data analysis, or data mining are discussed in the context of that application area. There are potentially many such courses around the University. You may also use any course listed above if not used to satisfy a track requirement.

- CSCI 5106 - Programming Languages
- CSCI 5123 - Recommender Systems
- CSCI 5421 - Advanced Algorithms and Data Structures
- CSCI 5461 - Functional Genomics, Systems Biology, and Bioinformatics
- CSCI 5541 - Natural Language Processing
- CSCI 5561 - Computer Vision
- CSCI 5980 - Special Topics in Computer Science
- CSCI 8271 - Security and Privacy in Computing
- CSCI 8363 - Numerical Linear Algebra in Data Exploration
- CSCI 8980 - Special Advanced Topics in Computer Science
- EE 5393 - Circuits, Computation and Biology
- GEOG 8920 - Urban Mobility & Accessibility
- IE 8534 - Advanced Topics in Operations Research
- IE 8535 - Introduction to Network Science
- MATH 5467 - Introduction to the Mathematics of Image and Data Analysis
- PUBH 7445 - Statistics for Human Genetics and Molecular Biology
- PUBH 7461 - Exploring and Visualizing Data in R
- PUBH 8445 - Statistics for Human Genetics and Molecular Biology
- PUBH 8446 - Advanced Statistical Genetics and Genomics
- PUBH 8472 - Spatial Biostatistics
- 5xxx 8xxx topics classes in CSCI, EE, STAT (but not STAT 5021), PUBH, etc. (See advisor for approval)
- Any advanced class in advanced data driven applications or emerging applications (with advisor and DGS approval)
**Note: No PUBH 6xxx, 5xxx, or 4xxx level courses can be used as electives in Data Science**

To use a course not specifically listed as an elective, submit a short note to your advisor or the Director of Graduate Studies explaining how the course meets one of these criteria together with a syllabus or other details of the course which lists the topics and the level at which the topics are taught. The level is often indicated indirectly by the prerequisites.

## Colloquium

**Take 1 credit of the Data Science Colloquium (or equivalent in a participating department).**

A colloquium course is mandatory and must appear on the student’s graduate degree plan form.

- DSCI 8970 - Data Science M.S. Colloquium

## Capstone project

**Take 3 or 6 credits for the capstone project.**

One of the key features of the M.S. in Data Science curriculum is a capstone project that makes the theoretical knowledge gained in the program operational in realistic settings.

Under the supervision of a faculty member, students will go through the entire process of solving a real-world problem: from collecting and processing real-world data, to designing the best method to solve the problem, and finally, to implementing a solution. The problems and datasets that students engage with will come from real-world settings identical to what you might encounter in industry, academia, or government. Examples of projects and the wide variety of topics they cover can be found on the research page, though this is only a partial list. Data-driven capstone projects may also grow from temporary internships at companies, with DGS approval.

The Plan B project is completed under the guidance of a data science faculty member. Students are responsible for identifying and selecting their faculty advisor. Data Science M.S. students will present a final poster at the annual Poster Fair on their Plan B project as part of their degree requirements. The annual Poster Fair takes place every spring semester, and there will also be an opportunity for students to present in fall as needed. Students should then complete their written report as soon as possible following their poster presentation, which can be completed sometime in the subsequent semester if needed. If a report is not submitted and a final decision is not recorded by the subsequent semester’s program sponsored poster presentation opportunity, the student will be required to present a new poster to reflect updated work on their project. For example, a student who presents at the Spring poster fair should complete their written report by the next poster presentation event the subsequent Fall semester. Questions on this process can be directed to the Graduate Program Coordinator.

- DSCI 8760 - Data Science M.S. Plan B Project

### Committee requirements

Your M.S. degree committee must consist of three University of Minnesota-Twin Cities faculty members with formal graduate education responsibilities. Two committee members must be from the Data Science Program, including your advisor who serves as committee chair, and one committee member from an outside program

A qualified advisor from outside the data science faculty may be selected with DGS approval. You may be asked to provide a CV for that potential advisor. Your final project report will be approved by a committee of three faculty including your advisor and including at least one member of the data science faculty. If not already on the data science faculty, your advisor may like to join (they should send a short CV to the DGS), otherwise you will need to find a current member acting as a co-advisor to approve the final report. In any case, the three committee members should represent at least two different home departments. In other words, all three committee members cannot be from the same department. You will also be expected to give a short oral presentation on your project open to faculty, students, and other interested parties.

## GRAD 999

GRAD 999 is a zero-credit, zero-tuition registration option intended for graduate students who have completed all coursework and (if applicable) thesis credit requirements, and who must maintain registration to meet the registration requirement. Students who wish to enroll in GRAD 999 **must** have program permission in order to do so. For more details on enrolling, please contact the Graduate Program Coordinator to see if this is the best option for you.

More details on implications for enrolling in GRAD 999 can be found on One Stop Student Service's website.