Data Science and Machine Learning

Yousef Saad poses in Keller Hall
Dan Knights poses in Keller Hall
Yao-Yi Chiang poses in Keller Hall
AI CLIMATE researching gather around map for conversation
Vipin Kumar posed in Keller Hall
Vipin Kumar posed in Keller Hall
Professors Shekhar and Mokbel
The ReaLSAT dataset is a comprehensive global dataset of lakes and reservoirs, tracking changes over the last 30+ years
Mohamed Mokbel poses in Keller Hall
George Karypis headshot
Chad Myers

Our faculty in this area are focused on building the next generation of computing approaches and systems designed for harnessing the power of big data. Research questions focus on developing scalable algorithms, databases, and new data mining methods for extracting meaning from large datasets. Our faculty in this area are both contributing new, generalizable methods as well as applying existing and new data mining and machine learning methods to important application areas.

Work in this area draws from and contributes to multiple domains, including:

  • Data Mining and Knowledge Discovery studies scalable and robust interest measures, algorithms and methods for mining interesting, useful and non-trivial patterns (e.g., anomalies, associations, classifiers, clusters) in big data for descriptive, prescriptive, predictive, planning, previously-unknown knowledge discovery tasks in diverse applications;
  • Machine Learning probes deep, supervised, unsupervised, self-supervised, semi-supervised, knowledge-guided and other generalizable training methods to leverage representative data samples for prediction, decision-making and other tasks (e.g., object recognition in images, spam detection in emails), where explicitly-programmed algorithms are infeasible or ineffective; 
  • Database Management Systems studies data models (conceptual, logical, physical), query languages (e.g., SQL), query processing and optimization, storage and indexing, concurrency control, recovery from failure, privacy, scalability, and system structures to manage persistent, interrelated and shared databases for decision support, transaction processing, collaborative work, and scientific discovery; 
  • Spatial Data Science explores space-time concepts, context, relationships, patterns, and algorithms to collect, model, integrate and analyze location-aware data (e.g., census, geo-imagery, maps) to understand and/or design location-based services (e.g., delivery, navigation, ride-sharing), systems (e.g., GPS, GIS), and methods (e.g., spatial statistics) when generics are inaccurate or inadequate; 
  • Computational Biology develops algorithms and software for deriving insight from biological and medical data, including methods for analyzing genomic and metagenomic data, protein structure and function, and biological and metabolic networks; 
  • High Performance Computing and Parallel Computing designs approaches for aggregating large numbers of computing elements to solve complex, computationally intensive problems; 
  • Linear Algebra Methods develops algorithms to deal with matrix computations and study their theoretical properties. These algorithms can target large dense or sparse matrices, as well as problem-specific computations such as those related to graphs or computational statistics or  in the analysis and characterization of large matrices of numerical data; 
  • Data Science for Digital Health is the broad area of using digital technology in a broad manner across all aspects of healthcare. Given the availability of vast amounts of data, there’s an opportunity to make practically all healthcare decisions more data driven. This area focuses on a wide range of problems in this area; and 
  • Applications of new and existing data mining methods to various areas of science including biology, biomedical informatics, chemical informatics, computational social science, digital health, epidemiology, climate modeling, sustainability, materials science, geographic information systems, and social network analysis.

Core Faculty

Boley Dan
Professor, Distinguished University Teaching Professor, Director of Graduate Studies for Data Science
4-225C Kenneth H. Keller Hall
Yao-Yi Chiang headshot
Associate Professor
5-191 Kenneth H. Keller Hall
Ge Chang headshot
Assistant Professor
4-213 Kenneth H. Keller Hall
George Karypis headshot
Professor, Distinguished McKnight University Professor
483 Walter Library
Dan Knights
Associate Professor
6-124 Molecular And Cellular Biology
Rui Kuang headshot
6-187 Kenneth H. Keller Hall
Vipin Kumar headshot
Regents Professor, William Norris Land Grant Chair in Large-Scale Computing, Data Science Initiative Director
5-225C Kenneth H. Keller Hall
Mohamed F. Mokbel
Professor, Distinguished McKnight University Professor
4-207 Kenneth H. Keller Hall
Chad Myers headshot
Professor, Co-Director of Graduate Studies for Bioinformatics and Computational Biology
6-116 Molecular And Cellular Biology
Yousef Saad headshot
Professor, CSE Distinguished Professor, William Norris Land Grant Chair in Large-Scale Computing
5-225B Kenneth H. Keller Hall
Shashi Shekhar
Professor, Distinguished McKnight University Professor, Distinguished University Teaching Professor, ADC/CSE Chair, AI-CLIMATE Institute Director
5-203 Kenneth H. Keller Hall
Jaideep Srivastava headshot
Professor, Director of Undergraduate Studies for Data Science
5-209 Kenneth H. Keller Hall
Yogatheesan Varatharajah headshot
Assistant Professor
4-203 Kenneth H. Keller Hall

Affiliated Faculty

Ravi Janardan
6-217 Kenneth H. Keller Hall
Ju Sun
Assistant Professor
6-213 Kenneth H. Keller Hall
Associate Professor, Dean's Fellow
5-213 Kenneth H. Keller Hall