Data Science

10 CEMS research groups and 3 members of the CEMS graduate faculty use data-driven methods to advance fundamental and applied research in the design of chemicals, materials, and biological systems. These methods, falling under the umbrella of artificial intelligence, machine learning, and data science, are rapidly revolutionizing every industry. In recent years, we have responded to this by integrating data-driven methods into our research, teaching, and outreach efforts. This page will highlight some ongoing research activities in this space. For more information on our other data science efforts, including our Master’s program in Data Science for Chemical Engineering and Materials Science, please see the CEMS Data Science Homepage. See below for descriptions of focus research areas leveraging data science along with the associated CEMS faculty and selected publications, funding sources, and affiliated campus resources.

Materials for sustainability

Data Science

The design and discovery of new materials is at the forefront of the fight to mitigate the effects of climate change. A rapidly changing energy sector demands the identification of new materials that are more efficient, more sustainably sourced, and enable new technologies. CEMS researchers are leveraging data-driven approaches to accelerate the timeline from materials discovery to implementation in these next-generation devices. You can also read more generally about our work designing materials for sustainability in our overviews on Materials Theory and Sustainability.

Related Faculty and Research Groups:

Biological engineering

DS 2

In the area of biological engineering, there is an immense design space to consider when engineering a targeted therapeutic. Machine learning is being used in concert with high-throughput experimental design of polymer, nanoparticle, and protein-based delivery vehicles, leading to new insights and quicker design of improved carriers. See our research overview on Biological Engineering to learn more generally about this topic of research in CEMS.

Related Faculty and Research Groups:

Complex systems

DS 3

From supply chains of fuels and chemicals to energy distribution and storage to metabolic processes in the human body, systems are all around us. CEMS faculty are pioneering the incorporation of data-driven approaches with mathematical modeling, simulation, process design, synthesis, optimization, control, and systems biology to understand these complex systems and leverage that understanding toward novel approaches to systems engineering. See our research overview on Systems Engineering to learn more generally about this topic of research in CEMS.

Related Faculty and Research Groups:

Relevant Collaborative Partners and Core Facilities

The University of Minnesota presents a thriving ecosystem to foster the application of data science methods to interdisciplinary problems, such as those being addressed in CEMS. The Minnesota Supercomputing Institute located in Walter Library, a short walk from Amundson Hall, houses three high performance computing clusters (supercomputers), providing more than 80,000 CPU and GPU cores to research groups at UMN, along with extensive cloud computing, data storage, and dedicated staff to foster the computational needs of UMN faculty. 

Within CEMS, we also have an IT team that supports research groups with data collection, storage, and management to facilitate the systematic data retention needed for machine learning applications. 

Within the College of Science & Engineering at UMN is the Data Science Initiative which brings together >75 faculty from >10 departments in the College (including CEMS) to collaborate on the application of data science to interdisciplinary problems of broad interest to the university (materials design, sustainability, environmental conservation, information security, etc.). 

CEMS also works extensively with industrial partners on data science activities, including through the Data-Driven Discovery and Design (4D) program hosted by IPRIME. Affiliated companies are supporting graduate fellowships for students in the recently introduced M.S. in Data Science for CEMS, which was created to train the next generation of chemical and materials engineers on the application of data science within these disciplines. Our partnerships with industry are celebrated annually through the Peter O. Stahl Advanced Design Forum, where CEMS faculty and students gather with leaders from the chemicals, materials, and biotech industries to discuss new ideas and best practices as we work together to transform these industries through the application of data science. 

 

Major Funding Sources 

Publications and Patents 

Expand all

Stabilizing a Double Gyroid Network Phase with 2 nm Feature Size by Blending of Lamellar and Cylindrical Forming Block Oligomers

data

Molecular dynamics simulations are used to study binary blends of an AB-type diblock and an AB2-type miktoarm triblock amphiphiles (also known as high-χ block oligomers) consisting of sugar-based (A) and hydrocarbon (B) blocks. In their pure form, the AB diblock and AB2 triblock amphiphiles Read More...

Related Faculty: 

Ilja Siepmann, Tim Lodge, Kevin Dorfman, Frank Bates, Mahesh Mahanthappa

Fingerprinting diverse nanoporous materials for optimal hydrogen storage conditions using meta-learning

data

Adsorptive hydrogen storage is a desirable technology for fuel cell vehicles, and efficiently identifying the optimal storage temperature requires modeling hydrogen loading as a continuous function of pressure and temperature. Using data obtained from Read More...

Related Faculty: 

Ilja Siepmann

Approach for statistical analysis of oxide- and sulfate-induced hot corrosion of advanced alloys

data

This work develops an automated image and statistical analysis protocol to characterize and compare the effect of three deposits on the accelerated non-uniform degradation of a model alloy (FeCrAlY) at 1025 °C in dry air. Compared to the sample tested without a deposit, the Ca-containing mixed oxide deposit increased Read More...

Related Faculty:

David Poerschke

New tolerance factor to predict the stability of perovskite oxides and halides

data

Predicting the stability of the perovskite structure remains a long-standing challenge for the discovery of new functional materials for many applications including photovoltaics and electrocatalysts. We developed an accurate, physically interpretable, and Read More...

Related Faculty:

Chris Bartel

Massively parallel pooled screening reveals genomic determinants of nanoparticle delivery

data

Nanoparticles are increasingly being tested as vehicles for delivering therapeutics, and some are already in clinical use for cancer chemotherapy. Nanoparticle-based treatments can offer various therapeutic advantages such as decreased toxicity, longer half-life, and improved drug delivery. However, there are Read More...

Related Faculty:

Natalie Boehnke

Combinatorial Polycation Synthesis and Causal Machine Learning Reveal Divergent Polymer Design Rules for Effective pDNA and Ribonucleoprotein Delivery

data

The development of polymers that can replace engineered viral vectors in clinical gene therapy has proven elusive despite the vast portfolios of multifunctional polymers generated by advances in polymer synthesis. Functional delivery of payloads such as plasmids (pDNA) and ribonucleoproteins (RNP) to various cellular populations and Read More...

Related Faculty:

Theresa Reineke

Model-guided engineering of DNA sequences with predictable site-specific recombination rates

data

Site-specific recombination (SSR) is an important tool in synthetic biology, but its applications are limited by the inability to predictably tune SSR reaction rates. Facile rate manipulation could be achieved by modifying the DNA substrate sequence; however, this approach lacks rational design principles. Here, we Read More...

Related Faculty:

Samira Azarin

High-throughput developability assays enable library-scale identification of producible protein scaffold variants

data

Poor protein developability is a critical hindrance to biologic discovery and engineering. Experimental capacity limits variant analysis. We demonstrate the ability of an on-yeast protease assay, a split green fluorescent protein assay, and a split β-lactamase assay to predict recombinant protein production yields in bacteria. The assays presented Read More...

Related Faculty:

Ben Hackel

Efficient learning of decision-making models: A penalty block coordinate descent algorithm for data-driven inverse optimization

data

Decision-making problems are commonly formulated as optimization problems, which are then solved to make optimal decisions. In this work, we consider the inverse problem where we use prior decision data to uncover the underlying decision-making process in the form of a mathematical optimization model. This statistical learning problem is Read More...

Related Faculty:

Qi Zhang

 

Dissipativity learning control (DLC): A framework of input–output data-driven control - ScienceDirect

data

The paper addresses data-driven control based on input–output data in the absence of an underlying dynamic model. It proposes a dissipativity learning control (DLC) framework which involves the data-based learning of the dissipativity property of the control system, followed by a Read More...

Related Faculty:

Prodromos Daoutidis