Data Science
10 CEMS research groups and 3 members of the CEMS graduate faculty use data-driven methods to advance fundamental and applied research in the design of chemicals, materials, and biological systems. These methods, falling under the umbrella of artificial intelligence, machine learning, and data science, are rapidly revolutionizing every industry. In recent years, we have responded to this by integrating data-driven methods into our research, teaching, and outreach efforts. This page will highlight some ongoing research activities in this space. For more information on our other data science efforts, including our Master’s program in Data Science for Chemical Engineering and Materials Science, please see the CEMS Data Science Homepage. See below for descriptions of focus research areas leveraging data science along with the associated CEMS faculty and selected publications, funding sources, and affiliated campus resources.
Materials for sustainability
The design and discovery of new materials is at the forefront of the fight to mitigate the effects of climate change. A rapidly changing energy sector demands the identification of new materials that are more efficient, more sustainably sourced, and enable new technologies. CEMS researchers are leveraging data-driven approaches to accelerate the timeline from materials discovery to implementation in these next-generation devices. You can also read more generally about our work designing materials for sustainability in our overviews on Materials Theory and Sustainability.
Related Faculty and Research Groups:
Biological engineering
In the area of biological engineering, there is an immense design space to consider when engineering a targeted therapeutic. Machine learning is being used in concert with high-throughput experimental design of polymer, nanoparticle, and protein-based delivery vehicles, leading to new insights and quicker design of improved carriers. See our research overview on Biological Engineering to learn more generally about this topic of research in CEMS.
Related Faculty and Research Groups:
Complex systems
From supply chains of fuels and chemicals to energy distribution and storage to metabolic processes in the human body, systems are all around us. CEMS faculty are pioneering the incorporation of data-driven approaches with mathematical modeling, simulation, process design, synthesis, optimization, control, and systems biology to understand these complex systems and leverage that understanding toward novel approaches to systems engineering. See our research overview on Systems Engineering to learn more generally about this topic of research in CEMS.
Related Faculty and Research Groups:
Relevant Collaborative Partners and Core Facilities
The University of Minnesota presents a thriving ecosystem to foster the application of data science methods to interdisciplinary problems, such as those being addressed in CEMS. The Minnesota Supercomputing Institute located in Walter Library, a short walk from Amundson Hall, houses three high performance computing clusters (supercomputers), providing more than 80,000 CPU and GPU cores to research groups at UMN, along with extensive cloud computing, data storage, and dedicated staff to foster the computational needs of UMN faculty.
Within CEMS, we also have an IT team that supports research groups with data collection, storage, and management to facilitate the systematic data retention needed for machine learning applications.
Within the College of Science & Engineering at UMN is the Data Science Initiative which brings together >75 faculty from >10 departments in the College (including CEMS) to collaborate on the application of data science to interdisciplinary problems of broad interest to the university (materials design, sustainability, environmental conservation, information security, etc.).
CEMS also works extensively with industrial partners on data science activities, including through the Data-Driven Discovery and Design (4D) program hosted by IPRIME. Affiliated companies are supporting graduate fellowships for students in the recently introduced M.S. in Data Science for CEMS, which was created to train the next generation of chemical and materials engineers on the application of data science within these disciplines. Our partnerships with industry are celebrated annually through the Peter O. Stahl Advanced Design Forum, where CEMS faculty and students gather with leaders from the chemicals, materials, and biotech industries to discuss new ideas and best practices as we work together to transform these industries through the application of data science.
CEMS also leads a National Science Foundation Research Traineeship (NRT) program that bridges chemical, biological, and materials engineering with data science and systems engineering, through convergent education and research and industry-university collaboration. Learn more about the Data-Driven Discovery, and Engineering from Atoms to Processes (3DEAP) program at the program's website.
Major Funding Sources
Publications and Patents
+
Stabilizing a Double Gyroid Network Phase with 2 nm Feature Size by Blending of Lamellar and Cylindrical Forming Block Oligomers
Molecular dynamics simulations are used to study binary blends of an AB-type diblock and an AB2-type miktoarm triblock amphiphiles (also known as high-χ block oligomers) consisting of sugar-based (A) and hydrocarbon (B) blocks. In their pure form, the AB diblock and AB2 triblock amphiphiles. Read the full article at the ACS Publications website.
Related Faculty:
Ilja Siepmann, Tim Lodge, Kevin Dorfman, Frank Bates, Mahesh Mahanthappa
+
Fingerprinting diverse nanoporous materials for optimal hydrogen storage conditions using meta-learning
Adsorptive hydrogen storage is a desirable technology for fuel cell vehicles, and efficiently identifying the optimal storage temperature requires modeling hydrogen loading as a continuous function of pressure and temperature. Read the full article at Science's website.
Related Faculty:
+
Approach for statistical analysis of oxide- and sulfate-induced hot corrosion of advanced alloys
This work develops an automated image and statistical analysis protocol to characterize and compare the effect of three deposits on the accelerated non-uniform degradation of a model alloy (FeCrAlY) at 1025 °C in dry air. Read the full article at ScienceDriect's website.
Related Faculty:
+
New tolerance factor to predict the stability of perovskite oxides and halides
Predicting the stability of the perovskite structure remains a long-standing challenge for the discovery of new functional materials for many applications including photovoltaics and electrocatalysts. Read the full article at Science's website.
Related Faculty:
+
Combinatorial Polycation Synthesis and Causal Machine Learning Reveal Divergent Polymer Design Rules for Effective pDNA and Ribonucleoprotein Delivery
The development of polymers that can replace engineered viral vectors in clinical gene therapy has proven elusive despite the vast portfolios of multifunctional polymers generated by advances in polymer synthesis. Read the full article at the ACS Publications website.
Related Faculty:
+
Model-guided engineering of DNA sequences with predictable site-specific recombination rates
Site-specific recombination (SSR) is an important tool in synthetic biology, but its applications are limited by the inability to predictably tune SSR reaction rates. Facile rate manipulation could be achieved by modifying the DNA substrate sequence; however, this approach lacks rational design principles. Read the full publication at Nature Communication's website.
Related Faculty:
+
High-throughput developability assays enable library-scale identification of producible protein scaffold variants
Poor protein developability is a critical hindrance to biologic discovery and engineering. Experimental capacity limits variant analysis. We demonstrate the ability of an on-yeast protease assay, a split green fluorescent protein assay, and a split β-lactamase assay to predict recombinant protein production yields in bacteria. Read the full article at the PNAS website.
Related Faculty:
+
Efficient learning of decision-making models: A penalty block coordinate descent algorithm for data-driven inverse optimization
Decision-making problems are commonly formulated as optimization problems, which are then solved to make optimal decisions. In this work, we consider the inverse problem where we use prior decision data to uncover the underlying decision-making process in the form of a mathematical optimization model. Read the full article at ScienceDriect's website.
Related Faculty:
+
Dissipativity learning control (DLC): A framework of input–output data-driven control - ScienceDirect
The paper addresses data-driven control based on input–output data in the absence of an underlying dynamic model. It proposes a dissipativity learning control (DLC) framework which involves the data-based learning of the dissipativity property of the control system. Read the full article at ScienceDirect's website.
Related Faculty: