CSE DSI Machine Learning Seminar with Yiqiao Zhong

A Geometric Journey into the World of Large Language Models

Transformers are neural networks that underpin the recent success of large language models.  They are often used as black-box models and building blocks of complex AI systems. Yet, it is unclear what information is processed through layers of a transformer, which raises the issue of interpretability.

In this talk, I will present an empirical study of transformers by examining various pretrained transformer models. A surprisingly consistent geometry pattern emerges in hidden states (or intermediate-layer embeddings) across layers, models, and datasets. Our study (1) provides structural characterization of the learned weight matrices and self-attention mechanism, and (2) suggests that hidden smoothness is essential for the success of transformers.

Yiqiao Zhong is an assistant professor in the Department of Statistics at the University of Wisconsin-Madison since 2022. He obtained his Ph.D. from Princeton University under the supervision of Jianqing Fan, and worked as a postdoc at Stanford University with Andrea Montanari and David Donoho. Yiqiao is interested in the analysis of deep learning (especially large language models), and its theoretical foundations. Yiqiao is also interested in high-dimensional statistics, e.g. generalization properties of overparametrized models.

Link to paper
 

Start date
Tuesday, Nov. 21, 2023, 11 a.m.
End date
Tuesday, Nov. 21, 2023, Noon
Location

Keller Hall 3-180 or Zoom.

Share