Rusack part of effort that could help improve AI models

School of Physics and Astronomy Professor Roger Rusack is part of an effort to make data usable across multiple disciplines. It is hoped that doing so will allow scientists outside of Rusack’s area of experimental particle physics to build better algorithms and solve problems using artificial intelligence. 

Physicists in the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider at CERN in Switzerland, are repackaging images of Higgs boson decays and quark and gluon background to be in line with a set of standards to make them more findable, accessible, interoperable, and reusable, or FAIR, for both people and machines.

Physicists use artificial intelligence algorithms and modeling but because their area of research is so specialized, these programs have necessarily been created by and for particle physicists. Adopting a  more universal set of best practices should allow their end product to be used to train new, more powerful tools in the future.

FAIR principles date back to 2016, but researchers are still figuring out how they apply to particular datasets. In a new study, researchers from the U.S. Department of Energy’s Argonne National Laboratory, MIT, and the Universities of California San Diego, Minnesota, and Illinois Urbana-Champaign have developed new practices to apply FAIR principles to the field of particle physics.

The FAIR principles were created to serve as goals for data producers and publishers to improve data management and stewardship practices,” said Argonne computational scientist Eliu Huerta, an author of the study. “The community expects that adhering to these principles will enhance the capabilities of machines to automate the finding and use of data, thereby streamlining the reuse of data for humans.” 

In addition to building FAIR datasets,  the physicists also sought to understand the FAIRness of AI models. “To have a FAIR AI model, we believe you need to have a FAIR dataset to train it on,” said Yifan Chen, the first author of the paper and a graduate student at the University of Illinois Urbana-Champaign and Argonne’s Data Science and Learning division. “Applying the FAIR principles to AI models will automate and streamline the design and use of those models for scientific discovery.”

“Our goal is to shed new light into the interplay of AI models and experimental data and help create a rigorous framework for the development of AI tools to address the biggest challenges in science,” Huerta added.

The scientists hope that using the agreed upon set of best practices will improve algorithms and modeling throughout the sciences. “We’re looking at the entire discovery cycle, from data production and curation, design and deployment of smart and modern computing environments and scientific data infrastructures, and the combination of these to create AI frameworks that power disruptive advances in our understanding of scientific phenomena,” he said.