Colloquium: Analyze and rebuild: Redesigning distributed computing systems for the next killer app

The computer science colloquium takes place on Mondays and Fridays from 11:15 a.m. - 12:15 p.m.

This week's speaker, Ali Anwar (IBM Research - Almaden), will be giving a talk titled "Analyze and rebuild: Redesigning distributed computing systems for the next killer app".

Abstract

Modern data applications such as distributed machine learning are revolutionizing all aspects of computing based scientific discovery. As new applications, algorithms, and techniques are invented, the underlying distributed system platforms supporting these uses face fundamentally new challenges. One of such challenges is the workload dynamicity that renders static and design-time system decisions impractical in supporting ever-changing application needs. Studying the workload characteristics of these applications and making informed design decisions can significantly improve the efficiency of the underlying distributed system or platform that enables such applications. Similarly, the resource and data heterogeneity also play an important role in defining the overall performance of these applications. 

This talk covers two of my projects where performing workload and resource usage analysis enabled us to design better systems. First, I will show how studying the workload characteristics of Docker - the de facto standard for data center containers management, at enterprise scale using IBM production systems enabled us to better deal with workload dynamicity, and create a number of optimizations to improve application performance. Second, I will present how we enhanced the powerful Federated Learning approach in distributed machine learning by making it aware of the underlying platform characteristics, such as resource and data heterogeneity, and show how the heterogeneity can affect the robustness of trained models under adversarial attacks. I will conclude with a discussion of plans for my future research.

Biography

Ali Anwar is a Research Staff Member at IBM Research Almaden Center. He holds a Ph.D. degree in Computer Science from Virginia Tech. In his earlier years he worked as an open-source tools developer (GNU GDB) at Mentor Graphics. His research interest lies at the intersection of systems and machine learning. The overarching goal of his research is to enable efficient and flexible systems for the growing data demands of modern high-end applications running on existing as well as emerging computing platforms. His current ongoing work focuses on distributed machine/federated learning systems and platforms, serverless and microservice-based systems, and efficient storage for Docker containers. 

His research has appeared in a number of premier conferences and workshops in computer systems, AI/ML, and high-performance computing, including USENIX FAST, ATC, HotStorage, ACM/IEEE SC, ACM HPDC, SoCC, AISec [Best Paper Award], and AAAI. He regularly performs professional community services and has served as a program committee member for conferences such as SC, HPDC, ICDCS, CCGrid, and a reviewer for journals like ToS, TPDS, TKDE, TCC and JPDC. He is also an associate editor for Neural Processing Letters. At IBM, he has been recognized as a 2019 Outstanding Research Accomplishment winner for Advancing Adversarial Robustness in AI Models. In 2020, he received two Research Accomplishment awards for his research on Enterprise-Strength Federated Learning for Hybrid Cloud and Edge, and Container Storage. He is also a recipient of Pratt Fellowship awarded by Dept. of Computer Science at Virginia Tech.

Category
Start date
Friday, March 12, 2021, 11:15 a.m.
End date
Friday, March 12, 2021, 12:15 p.m.
Location

Online - Zoom Link

 

Share