Data Science and Machine Learning

Yousef Saad poses in Keller Hall
Mohamed F. Mokbel headshot
The ReaLSAT dataset is a comprehensive global dataset of lakes and reservoirs, tracking changes over the last 30+ years
Xiaowei Jia
Vipin Kumar, George Karypis, and Yao-Yi Chiang headshots
Vipin Kumar, George Karypis and women pose with award
Biology web chart
Global map
Dan Knights
Chad Myers

Our faculty in this area are focused on building the next generation of computing approaches and systems designed for harnessing the power of big data. Research questions focus on developing scalable algorithms, databases, and new data mining methods for extracting meaning from large datasets. Our faculty in this area are both contributing new, generalizable methods as well as applying existing and new data mining and machine learning methods to important application areas.

Work in this area draws from and contributes to multiple domains, including:

  • Data Mining and Knowledge Discovery studies scalable and robust interest measures, algorithms and methods for mining interesting, useful and non-trivial patterns (e.g., anomalies, associations, classifiers, clusters) in big data for descriptive, prescriptive, predictive, planning, previously-unknown knowledge discovery tasks in diverse applications;
  • Machine Learning probes deep, supervised, unsupervised, self-supervised, semi-supervised, knowledge-guided and other generalizable training methods to leverage representative data samples for prediction, decision-making and other tasks (e.g., object recognition in images, spam detection in emails), where explicitly-programmed algorithms are infeasible or ineffective; 
  • Database Management Systems studies data models (conceptual, logical, physical), query languages (e.g., SQL), query processing and optimization, storage and indexing, concurrency control, recovery from failure, privacy, scalability, and system structures to manage persistent, interrelated and shared databases for decision support, transaction processing, collaborative work, and scientific discovery; 
  • Spatial Data Science explores space-time concepts, context, relationships, patterns, and algorithms to collect, model, integrate and analyze location-aware data (e.g., census, geo-imagery, maps) to understand and/or design location-based services (e.g., delivery, navigation, ride-sharing), systems (e.g., GPS, GIS), and methods (e.g., spatial statistics) when generics are inaccurate or inadequate; 
  • Computational Biology develops algorithms and software for deriving insight from biological and medical data, including methods for analyzing genomic and metagenomic data, protein structure and function, and biological and metabolic networks; 
  • High Performance Computing and Parallel Computing designs approaches for aggregating large numbers of computing elements to solve complex, computationally intensive problems; 
  • Linear Algebra Methods develops algorithms to deal with matrix computations and study their theoretical properties. These algorithms can target large dense or sparse matrices, as well as problem-specific computations such as those related to graphs or computational statistics or  in the analysis and characterization of large matrices of numerical data; 
  • Data Science for Digital Health is the broad area of using digital technology in a broad manner across all aspects of healthcare. Given the availability of vast amounts of data, there’s an opportunity to make practically all healthcare decisions more data driven. This area focuses on a wide range of problems in this area; and 
  • Applications of new and existing data mining methods to various areas of science including biology, biomedical informatics, chemical informatics, computational social science, digital health, epidemiology, climate modeling, sustainability, materials science, geographic information systems, and social network analysis.

Core Faculty

Dan Boley headshot
Professor, Distinguished University Teaching Professor, Data Science Director of Graduate Studies
E-mail
Chad Myers headshot
Professor, Co-Director of Graduate Studies for Bioinformatics and Computational Biology
E-mail
Yousef Saad headshot
Professor, CSE Distinguished Professor, William Norris Land Grant Chair in Large-Scale Computing
E-mail
Shashi Shekhar headshot
Professor, Distinguished McKnight University Professor, Distinguished University Teaching Professor

Affiliated Faculty

Ju Sun headshot
Assistant Professor
E-mail
Catherine Zhao headshot
Associate Professor
E-mail