ISyE Seminar Series: Shihao Yang
"Towards Better Transformers Sequence Modeling: A Two-Way Exchange Between Time Series and Attention Mechanisms"
Shihao Yang
Harold E. Smalley Early Career Professor and Assistant Professor in School of Industrial & Systems Engineering
Georgia Institute of Technology
About the Seminar:
Despite Transformers' success across AI domains, their application to time series forecasting remains lukewarm. For example, in infectious disease forecasting it showed no clear advantage over classical statistical baselines, motivating us to bridge attention mechanisms and classical time series principles.
We first develop a statistical account of when and why attention should work: a single linear attention layer behaves like a low-rank Vector Autoregression (VAR), while stacking layers induces higher-rank lag interactions, yielding an attention design aligned with VAR structure. We then observe that typical Transformers are autoregressive only, missing the moving-average component in statistical time series models. By introducing an attention pathway over residuals inspired by ARMA models, we improve forecasting accuracy. For efficiency and long contexts, we reinterpret linear attention as a truncated softmax and add dedicated pathways (beyond Q,K,V) capturing higher-order Taylor series to recover expressivity at linear time. We show how "prompt engineering" for time series enables zero-shot and few-shot forecasting through in-context learning, how auxiliary "scratch-paper" channels act as a chain-of-thought analogue for multivariate series, and how targeted training techniques stabilize and accelerate convergence.
Together, these contributions demonstrate how statistical thinking enables better Transformer designs for time series and reciprocally, how time series insights yield improved attention mechanisms that transfer to vision and text domains. This two-way exchange between classical statistical principles and modern deep learning architectures opens new directions for both fields.
Related Paper:
About the Speaker:
Dr. Shihao Yang is Harold E. Smalley Early Career Professor and Assistant Professor in School of Industrial & Systems Engineering at Georgia Tech. He completed his PhD in statistics and post-doc in Biomedical Informatics at Harvard University. Dr. Yang’s research focuses on data science, with special interest in time series, dynamical systems, and applications in infectious disease transmission forecasting.
If you wish to be added to the ISyE Graduate Seminar Series emailing list, please email Event Coordinator Emily Rice at [email protected].