CSE DSI Machine Learning Seminar with Aaron Molstad (Statistics, UMN)

A direct approach to tree-guided feature aggregation for high-dimensional regression

In high-dimensional linear models, sparsity is often exploited to reduce variability and achieve parsimony. Equi-sparsity, where one assumes that predictors can be aggregated into groups sharing the same effects, is an alternative parsimonious structure that can be more suitable in certain applications. Previous work has clearly demonstrated the benefits of exploiting equi-sparsity in the presence of “rare” features (Yan and Bien, 2021). In this work, we propose a new tree-guided regularization scheme for simultaneous estimation and feature aggregation. Unlike existing methods, our estimator avoids synthetic overparameterization and its detrimental effects. Novel techniques are developed to study the finite-sample error bound of this seminorminduced regularizer under least squares and binomial deviance losses. Theoretically, compared to existing methods, the proposed method offers a faster or equivalent rate depending on the true equi-sparisty structure. Extensive simulation studies verify these findings. We show that our estimator can be computed very efficiently by exploiting special properties of our penalty. Finally, we illustrate the usefulness of the proposed method with an application to a microbiome dataset, where we conduct post-selection inference on the aggregated features’ effects.

Aaron Molstad is Associate Professor in the School of Statistics and faculty affiliate in Data Science at the University of Minnesota.  Previously, he was a postdoctoral research fellow at the Fred Hutchinson Cancer Center in Seattle, Washington, and was Assistant Professor in Statistics at the University of Florida. His primary research interests are in multivariate analysis, numerical optimization, statistical genetics and genomics, and more broadly, statistical and machine learning. 
Start date
Tuesday, Dec. 2, 2025, 11 a.m.
End date
Tuesday, Dec. 2, 2025, Noon
Location

Keller 3-180 or via Zoom.

Share