CSE DSI Machine Learning Seminar with Ray Liu (CS&E, UMN)

Breaking Barriers: Advancing Long Context LLMs

LLMs have demonstrated impressive conversational abilities. However, scaling them to handle longer contexts, such as extracting information from lengthy articles—a critical task in healthcare, law, and finance applications—presents significant challenges. The two main obstacles are: first, LLMs struggle to process input lengths beyond what they encountered during pre-training; second, even when information is accurately extracted from extended contexts, deploying LLMs in real-world scenarios is limited by hardware capacity. I will discuss recent advances in serving long context LLMs at scale. To address the first challenge, I’ll present our work on extending LLM context length 10X by coarsening the positional encoding. For the second challenge, I will highlight our recent success in 2-bit KV Cache quantization. Lastly, I will discuss alternative architecture like Mamba and benchmark them against transformers in the long context scenarios.

Zirui Ray Liu is an Assistant Professor of Computer Science at University of Minnesota. His interests lie in the broad area of MLSys, LLM, and GraphML. He regularly published papers in top venues such as NeurIPS, ICML, ICLR, and MLSys. His work has been integrated into widely used NLP tools like Keras, Llama.cpp, and Huggingface Transformers, and was highlighted at Google I/O sessions.

CSE DSI Machine Learning Seminar with Ray Liu (CS&E, UMN)

Breaking Barriers: Advancing Long Context LLMs

Share