Data Science and Machine Learning
Our faculty in this area are focused on building the next generation of computing approaches and systems designed for harnessing the power of big data. Research questions focus on developing scalable algorithms, databases, and new data mining methods for extracting meaning from large datasets. Our faculty in this area are both contributing new, generalizable methods as well as applying existing and new data mining and machine learning methods to important application areas.
Work in this area draws from and contributes to multiple domains, including:
- Data Mining and Knowledge Discovery studies scalable and robust interest measures, algorithms and methods for mining interesting, useful and non-trivial patterns (e.g., anomalies, associations, classifiers, clusters) in big data for descriptive, prescriptive, predictive, planning, previously-unknown knowledge discovery tasks in diverse applications;
- Machine Learning probes deep, supervised, unsupervised, self-supervised, semi-supervised, knowledge-guided and other generalizable training methods to leverage representative data samples for prediction, decision-making and other tasks (e.g., object recognition in images, spam detection in emails), where explicitly-programmed algorithms are infeasible or ineffective;
- Database Management Systems studies data models (conceptual, logical, physical), query languages (e.g., SQL), query processing and optimization, storage and indexing, concurrency control, recovery from failure, privacy, scalability, and system structures to manage persistent, interrelated and shared databases for decision support, transaction processing, collaborative work, and scientific discovery;
- Spatial Data Science explores space-time concepts, context, relationships, patterns, and algorithms to collect, model, integrate and analyze location-aware data (e.g., census, geo-imagery, maps) to understand and/or design location-based services (e.g., delivery, navigation, ride-sharing), systems (e.g., GPS, GIS), and methods (e.g., spatial statistics) when generics are inaccurate or inadequate;
- Computational Biology develops algorithms and software for deriving insight from biological and medical data, including methods for analyzing genomic and metagenomic data, protein structure and function, and biological and metabolic networks;
- High Performance Computing and Parallel Computing designs approaches for aggregating large numbers of computing elements to solve complex, computationally intensive problems;
- Linear Algebra Methods develops algorithms to deal with matrix computations and study their theoretical properties. These algorithms can target large dense or sparse matrices, as well as problem-specific computations such as those related to graphs or computational statistics or in the analysis and characterization of large matrices of numerical data;
- Data Science for Digital Health is the broad area of using digital technology in a broad manner across all aspects of healthcare. Given the availability of vast amounts of data, there’s an opportunity to make practically all healthcare decisions more data driven. This area focuses on a wide range of problems in this area; and
- Applications of new and existing data mining methods to various areas of science including biology, biomedical informatics, chemical informatics, computational social science, digital health, epidemiology, climate modeling, sustainability, materials science, geographic information systems, and social network analysis.
Core Faculty
Professor, Distinguished University Teaching Professor
Associate Professor, Director of Graduate Studies for Data Science
5-191 Kenneth H. Keller Hall
Professor, Distinguished McKnight University Professor
Regents Professor, William Norris Land Grant Chair in Large-Scale Computing, Data Science Initiative Director
Professor, Distinguished McKnight University Professor, Director of Graduate Studies - Computer Science
Professor, Co-Director of Graduate Studies for Bioinformatics and Computational Biology
Professor, CSE Distinguished Professor, William Norris Land Grant Chair in Large-Scale Computing
Professor, Distinguished McKnight University Professor, Distinguished University Teaching Professor, ADC/CSE Chair, AI-CLIMATE Institute Director
Professor, Director of Undergraduate Studies for Data Science
Affiliated Faculty
Associate Professor, Dean's Fellow
Labs and selected projects
- Bioinformatics George Karypis
- Bioinformatics and Computational Biology Vipin Kumar
- Chemical Informatics George Karypis
- Computational Biology Rui Kuang
- Database and Spatial Data Mining Research Group Shashi Shekhar
- Data Mining George Karypis
- Discovery of Patterns in the Global Climate System using Data Mining George Karypis, Vipin Kumar, Shashi Shekhar
- DCSG: Distributed Computing Systems Group Jon Weissman
- Karypis Lab George Karypis
- Knights Lab Dan Knights
- pARMS: parallel Algebraic Recursive Multilevel Solvers Yousef Saad
- Principal Research Projects Daniel Boley
- Unsupervised Document Set Exploration Using Divisive Partitioning Daniel Boley
- UMN Machine Learning Seminar Series