Machine Learning Seminar

Joint association and classification of multi-view structured data

by

Sandra Safo
Biostatistics
School of Public Health
University of Minnesota

Wednesday, October 28, 2020
3:30–4:30 pm

Online via zoom view recording here

Classification methods that leverage the strengths of data from multiple sources (multi-view data) simultaneously have enormous potential to yield more powerful findings than two step methods: association followed by classification. We propose two methods, sparse integrative discriminant analysis (SIDA) and SIDA with incorporation of network information (SIDANet), for joint association and classification studies. The methods consider the overall association between multi-view data, and the separation within each view in choosing discriminant vectors that are associated and optimally separate subjects into different classes. SIDANet is among the first methods to incorporate prior structural information in joint association and classification studies. It uses the normalized Laplacian of a graph to smooth coefficients of predictor variables, thus encouraging selection of predictors that are connected. We demonstrate the effectiveness of our methods on a set of synthetic and real datasets. Our findings underscore the benefit of joint association and classification methods if the goal is to correlate multi-view data and to perform classification.


My primary research focuses on developing and applying statistical and machine learning methods and computational tools for big, biomedical data to advance clinical translational research and precision medicine. I have been developing multivariate statistical methods, statistical learning (including classification, discriminant analysis, association studies, biclustering), data integration, and feature selection methods for high dimensional data. Currently, I develop methods for integrative analysis of “omics” (including genomics, transcriptomics, and metabolomics) and clinical data to help elucidate complex interactions of these multifaceted data types.