CRAY Colloquium: Data Valuation in Machine Learning, AI, and Data (MAD) Systems

The computer science colloquium takes place on Mondays from 11:15 a.m. - 12:15 p.m. This week's speaker, Jian Pei (Duke University), will be giving a talk titled "Data Valuation in Machine Learning, AI, and Data (MAD) Systems".

Abstract

Valuation in data and AI markets is becoming a cornerstone of data science and artificial intelligence, studied in e-commerce, economics, machine learning, and data management. However, assessing the contribution of individual parties in a coalition completing a task—known as data valuation—remains a significant technical challenge. In this talk, I will present our research on fast, scalable data valuation methods designed for massive AI, machine learning, and data (MAD) systems. Our approach is built on three key innovations: utility- and game-specific evaluation for enhanced accuracy, efficient sampling techniques to reduce computational cost, and Shapley value approximation with limited information to handle extreme scenarios. I will advocate for data valuation as a fundamental service in the next generation MAD systems and explore open challenges and future research directions in this rapidly evolving field.

Biography

Dr. Jian Pei is the Arthur S. Pearse Distinguished Professor at Duke University, specializing in data science, big data, data mining, database systems, and applied machine learning. His research focuses on developing efficient data analysis techniques for data-intensive applications. A Fellow of the Royal Society of Canada, the Canadian Academy of Engineering, ACM, and IEEE, he has contributed algorithms adopted in industry and integrated into open-source software. He has also led the development of large-scale commercial systems. His work has been recognized with honors such as the ACM SIGKDD Innovation Award (2017) and Service Award (2015).

Start date
Monday, April 14, 2025, 11:15 a.m.
End date
Monday, April 14, 2025, 12:15 p.m.
Location

Share