Disease classification using multi-view longitudinal data with Deep IDA

Abstract

Crohn's disease and Ulcerative Colitis are common Inflammatory Bowel Diseases (IBD). In this work, we use the metabolomics, meta-transcriptomics, host-transcriptomics and clinical data of n=90 subjects (obtained from the the Inflammatory Bowel Disease Multi-omics Database - IBDMDB) to classify them into "disease" and "healthy" groups. The goal of this work is to integrate longitudinal and cross-sectional data to classify subjects into the two groups, and to determine molecular profiles and signatures separating the disease groups. The main contributions of this work are as follows: (i) Since the metabolomics and host-transcriptomics data is a multi-variate time-series, we use two methods based on Euler Characteristic and Functional Principle Component Analysis to condense these time-series data into one-dimensional vectors while preserving the important characteristics of the time-series; (ii) Since both the transcriptomics datasets have several thousand variables, we utilize pre-filtering methods and linear mixed models to get rid of the insignificant features; (iii) For classification, we use the DeepIDA network that uses deep neural network followed by Integrative Discriminant Analysis to effectively combine the data from different sources; and compare the performance of DeepIDA with traditional classification approaches like SVM; (iv) We use bootstraping strategy along with DeepIDA to extract the top variables of each view which contributed the most in the classification performance. Through this work, we identified signatures and profiles discriminating between healthy and diseased subjects and can shed light into the etiology of IBD.