CS&E Colloquium: Lossy computation done right: Scalable and Accessible LLM Fine-Tuning and Serving
The computer science colloquium takes place on Mondays and Fridays from 11:15 a.m. - 12:15 p.m. This week's speaker, Zirui Liu (Rice University), will be giving a talk titled "Lossy computation done right: Scalable and Accessible LLM Fine-Tuning and Serving".
Abstract
As the model size grows, Large language models (LLMs) have exhibited human-like conversation ability. This advancement opens the door to a wave of new applications, such as custom AI agents. To achieve this, two essential steps are involved: fine-tuning and serving. Fine-tuning is the process of adapting the LLM to a specific task, such as understanding and responding to domain-specific inquiries. The second step, serving, is about generating useful outputs to the questions in real-time. However, both of these two steps are hard and expensive due to the massive model scale, limiting their accessibility to most of the users.
Our key idea to overcome this challenge is that LLMs are extremely robust to the noise from lossy computation, such as low numerical precision and randomized computation like Dropout. Following this insight, we will discuss some recent results in fine-tuning and serving LLMs with much accessible hardware. First, I will share my research on using randomized matrix multiplication to make fine-tuning both faster and more memory-efficient. Following that, I will show the extremely low-bit model and KV Cache quantization can reduce the cost of the LLM serving process while maintaining performance. Finally, I will discuss my broader research vision in LLM data-quality problem, on-device LLM deployment, and biomedical applications.
Biography
Zirui Liu is a final year Ph.D. candidate from the Department of Computer Science at Rice University. His interests lie in the broad area of large-scale machine learning, particularly in algorithm-system co-design, randomized algorithm, and large-scale graph learning. He has published more than 20 papers in top venues such as ICLR, NeurIPS, ICML, and MLSys. Moreover, his research also has been widely deployed in industrial applications such as the Meta recommendation system and Samsung advertisement platform. Website: https://zirui-ray-liu.github.io/