Applications of Machine Learning Methods by CMS Experiment at the CERN Large Hadron Collider

Machine Learning Seminar Series


Rajdeep Chatterjee
School of Physics and Astronomy
University of Minnesota

The Compact Muon Solenoid (CMS) experiment is one of the two large, general purpose detectors at the European Organization of Nuclear Research (CERN), Large Hadron Collider (LHC). The CMS Collaboration is a multinational scientific collaboration, supported by 44 funding agencies from around the world. The LHC began operation in 2010 and will continue to run with significant upgrades into the next decade. The CMS experiment has been operational and collecting data for seven of the past ten years.

The data are collected from the approximately 130 million detector channels at a rate of 2 MByte at a rate of 40 MHz. After two stages of real-time online selection, where only those events that are of direct interest to the Physics program are selected, one thousand events, or 2 GBytes of data, are stored per second for future offline processing. Using this data, the properties of the recently discovered Higgs boson and many other physics processes at the very highest particle energies are studied. Managing and analyzing the tens of PetaBytes of data and metadata is a major challenge for the CMS Collaboration, which is only expected to become more challenging later in the decade, when the High Luminosity-LHC becomes operational with a collision rate that is a factor of five larger.

In order to cope with these challenges, the CMS collaboration has relied on an extensive and diverse set of supervised and unsupervised machine learning methods at every stage of the experiment, from the online selection of events to the offline analysis of the recorded data. This has resulted both in more efficient detector operations, as well as a significantly improved the quality of the physics results delivered. Some examples are the use of Boosted Decision Trees (BDTs) implemented into Field Programmable Gate Arrays (FPGAs) to select only those events that have the characteristics of the physics processes under study, with the requisite decision made within hundreds of nanoseconds. Autoencoders have been used to monitor the quality of the data being recorded and flagging anomalous detector channels. In order to optimize the identification and reconstruction of physics objects like electrons, photons, jets of particles, a variety of BDT and Deep Neural Network based algorithms have been developed. Many of the flagship physics results reported by the CMS Collaboration, like the measurement of the properties of recently discovered Higgs bosons, have been made possible by the use of dedicated machine learning algorithms.

In this seminar the authors will review the machine learning paradigm employed by the CMS collaboration through specific illustrative examples and also indicate directions for the future. As the LHC inches towards the close of the two-year shutdown and prepares for the Run 3, the collaboration has a golden opportunity to further explore novel machine learning methods that will continue to revolutionize the physics output of the CMS experiment.


Start date
Thursday, April 1, 2021, 10 a.m.
End date
Thursday, April 1, 2021, 11 a.m.

Online via zoom -