Chad Myers Harnesses AI to Improve Disease Prediction from Genetic Sequences

With the rise of services like 23andme and Ancestry, individuals have more information about their genetics than ever before. As more and more people participate, researchers are able to study genetic data and start putting together the puzzle that explains why people do and don’t get diseases. Department of Computer Science & Engineering Professor Chad Myers is harnessing artificial intelligence (AI) to analyze genetic data and improve disease prediction. Myers is also the co-director of graduate studies for the Bioinformatics and Computational Biology program at the University of Minnesota.

“Surprisingly with all the information scientists have gathered, machine learning models still don’t perform anywhere near as well as they should to predict who will actually get a disease,” said Myers. “We can get the genes, we can build models, but still we're not able to explain why these diseases are inherited. That's the basic problem we're working on - can we build better models that use genetic data to predict and reach their full potential?”

The majority of genome-wide association studies focus on one-to-one relationships between genes and diseases. Myers’ work focuses on combinations of variants to better explain the relationship between a person’s gene’s and specific diseases.

“Combinations of mutations are important and that makes that problem really difficult,” said Myers. “If a person has one million genetic variants in their genome, we now have roughly one million squared features to think about, because every pair of those one million features is potentially informative. These models can get very complex very quickly as soon as you start to think about those combinations. My lab is trying to develop machine learning approaches that can leverage these combinations to improve disease predictions.”

Myers leverages data from large public databases of genomic information from the NIH and the United Kingdom. Additionally, they collaborate with specific groups of scientists that are experts in diseases such as Parkinson’s, ALS and pediatric cancer.

“We can develop a model that could be applied to any of these diseases, but we need help from experts to interpret the results,” said Myers. “That collaboration is important, because it helps us understand where the models are failing and where they seem to be picking up on things that are consistent with the current disease research. The long term goal is to develop generalizable approaches that could be easily applied to other diseases.”

In the future, Myers hopes a successful model could help patients determine if they have a natural risk for various diseases and how lifestyle and environment might affect that risk during their annual check-up. People would have the information they need to hopefully avoid or prepare people for future health issues.

“We are leveraging the AI innovation that's happening right now and trying to make it work in the context of genomic data,” said Myers. “A lot of the latest developments in the AI community weren’t developed with biology and medical issues in mind. For example, graph neural networks are a powerful type of machine learning models being used on other problems, but we need to learn how to customize and adapt that approach to our problem. When we find an adapted version that works really well for one problem, that type of model might also benefit other people that are working in totally different domains. That's where applied fields like computational biology can contribute to the broader AI community.”

As the machine learning model finds relationships between combinations of gene variants, Myers can put that relationship to the test in another ongoing project in his lab based on CRISPR genome editing technology.

“My lab is working with experimental collaborators to use CRISPR to introduce specific combinations of mutations into model cell lines that can be grown in the lab and measure the impact of those mutations,” said Myers. “This approach is very complementary to studying natural variants in the human population, because we can introduce and study small numbers of mutations in isolation as opposed to the background of millions of genetic changes that appear together in a typical person’s genome. The combination of computational approaches applied to genome sequences from human populations and targeted experiments using CRISPR technology in cell lines will be very powerful.”

Learn more about Chad Myers’ work on his lab website.