Data Science

10 CEMS research groups and 3 members of the CEMS graduate faculty use data-driven methods to advance fundamental and applied research in the design of chemicals, materials, and biological systems. These methods, falling under the umbrella of artificial intelligence, machine learning, and data science, are rapidly revolutionizing every industry. In recent years, we have responded to this by integrating data-driven methods into our research, teaching, and outreach efforts. This page will highlight some ongoing research activities in this space. For more information on our other data science efforts, including our Master’s program in Data Science for Chemical Engineering and Materials Science, please see the CEMS Data Science Homepage. See below for descriptions of focus research areas leveraging data science along with the associated CEMS faculty and selected publications, funding sources, and affiliated campus resources.

The design and discovery of new materials is at the forefront of the fight to mitigate the effects of climate change. A rapidly changing energy sector demands the identification of new materials that are more efficient, more sustainably sourced, and enable new technologies. CEMS researchers are leveraging data-driven approaches to accelerate the timeline from materials discovery to implementation in these next-generation devices. You can also read more generally about our work designing materials for sustainability in our overviews on Materials Theory and Sustainability.

Related Faculty and Research Groups:

In the area of biological engineering, there is an immense design space to consider when engineering a targeted therapeutic. Machine learning is being used in concert with high-throughput experimental design of polymer, nanoparticle, and protein-based delivery vehicles, leading to new insights and quicker design of improved carriers. See our research overview on Biological Engineering to learn more generally about this topic of research in CEMS.

Related Faculty and Research Groups:

From supply chains of fuels and chemicals to energy distribution and storage to metabolic processes in the human body, systems are all around us. CEMS faculty are pioneering the incorporation of data-driven approaches with mathematical modeling, simulation, process design, synthesis, optimization, control, and systems biology to understand these complex systems and leverage that understanding toward novel approaches to systems engineering. See our research overview on Systems Engineering to learn more generally about this topic of research in CEMS.

Related Faculty and Research Groups:

Relevant Collaborative Partners and Core Facilities

The University of Minnesota presents a thriving ecosystem to foster the application of data science methods to interdisciplinary problems, such as those being addressed in CEMS. The Minnesota Supercomputing Institute located in Walter Library, a short walk from Amundson Hall, houses three high performance computing clusters (supercomputers), providing more than 80,000 CPU and GPU cores to research groups at UMN, along with extensive cloud computing, data storage, and dedicated staff to foster the computational needs of UMN faculty.

Within CEMS, we also have an IT team that supports research groups with data collection, storage, and management to facilitate the systematic data retention needed for machine learning applications.

Within the College of Science & Engineering at UMN is the Data Science Initiative which brings together >75 faculty from >10 departments in the College (including CEMS) to collaborate on the application of data science to interdisciplinary problems of broad interest to the university (materials design, sustainability, environmental conservation, information security, etc.).

CEMS also works extensively with industrial partners on data science activities, including through the Data-Driven Discovery and Design (4D) program hosted by IPRIME. Affiliated companies are supporting graduate fellowships for students in the recently introduced M.S. in Data Science for CEMS, which was created to train the next generation of chemical and materials engineers on the application of data science within these disciplines. Our partnerships with industry are celebrated annually through the Peter O. Stahl Advanced Design Forum, where CEMS faculty and students gather with leaders from the chemicals, materials, and biotech industries to discuss new ideas and best practices as we work together to transform these industries through the application of data science.

Expand all

Data Science

Materials for sustainability

Biological engineering

Complex systems

Relevant Collaborative Partners and Core Facilities

Major Funding Sources

Publications and Patents

Stabilizing a Double Gyroid Network Phase with 2 nm Feature Size by Blending of Lamellar and Cylindrical Forming Block Oligomers

Fingerprinting diverse nanoporous materials for optimal hydrogen storage conditions using meta-learning

Approach for statistical analysis of oxide- and sulfate-induced hot corrosion of advanced alloys

New tolerance factor to predict the stability of perovskite oxides and halides

Massively parallel pooled screening reveals genomic determinants of nanoparticle delivery

Combinatorial Polycation Synthesis and Causal Machine Learning Reveal Divergent Polymer Design Rules for Effective pDNA and Ribonucleoprotein Delivery

Model-guided engineering of DNA sequences with predictable site-specific recombination rates

High-throughput developability assays enable library-scale identification of producible protein scaffold variants

Efficient learning of decision-making models: A penalty block coordinate descent algorithm for data-driven inverse optimization

Dissipativity learning control (DLC): A framework of input–output data-driven control - ScienceDirect

Data Science

Materials for sustainability

Biological engineering

Complex systems

Relevant Collaborative Partners and Core Facilities

Major Funding Sources

Publications and Patents

+ Stabilizing a Double Gyroid Network Phase with 2 nm Feature Size by Blending of Lamellar and Cylindrical Forming Block Oligomers

+ Fingerprinting diverse nanoporous materials for optimal hydrogen storage conditions using meta-learning

+ Approach for statistical analysis of oxide- and sulfate-induced hot corrosion of advanced alloys

+ New tolerance factor to predict the stability of perovskite oxides and halides

+ Massively parallel pooled screening reveals genomic determinants of nanoparticle delivery

+ Combinatorial Polycation Synthesis and Causal Machine Learning Reveal Divergent Polymer Design Rules for Effective pDNA and Ribonucleoprotein Delivery

+ Model-guided engineering of DNA sequences with predictable site-specific recombination rates

+ High-throughput developability assays enable library-scale identification of producible protein scaffold variants

+ Efficient learning of decision-making models: A penalty block coordinate descent algorithm for data-driven inverse optimization

+ Dissipativity learning control (DLC): A framework of input–output data-driven control - ScienceDirect

Stabilizing a Double Gyroid Network Phase with 2 nm Feature Size by Blending of Lamellar and Cylindrical Forming Block Oligomers

Fingerprinting diverse nanoporous materials for optimal hydrogen storage conditions using meta-learning

Approach for statistical analysis of oxide- and sulfate-induced hot corrosion of advanced alloys

New tolerance factor to predict the stability of perovskite oxides and halides

Massively parallel pooled screening reveals genomic determinants of nanoparticle delivery

Combinatorial Polycation Synthesis and Causal Machine Learning Reveal Divergent Polymer Design Rules for Effective pDNA and Ribonucleoprotein Delivery

Model-guided engineering of DNA sequences with predictable site-specific recombination rates

High-throughput developability assays enable library-scale identification of producible protein scaffold variants

Efficient learning of decision-making models: A penalty block coordinate descent algorithm for data-driven inverse optimization

Dissipativity learning control (DLC): A framework of input–output data-driven control - ScienceDirect