GANs for BDSim - Summer Student Project

Description: The NA62 experiment at CERN is searching for rare Kaon decay modes to reveal new physics, and probe the limits of the Standard Model. Monte Carlo simulations are an indispensable part for the success of any physics experiment, from the initial design of the experiment all the way to the analysis of the obtained data where they are used to calculate acceptances. It is thus very important that the physics, the model and responses of the detector, and all contributing details are accurately implemented in the simulation software. Rigorous comparisons between experimental data and Monte Carlo (MC) simulations are required, a process called MC Validation.

The NA62 Monte Carlo simulations use BDSim for muon halo overlay. MC Validation, in order to study changes in the model and the impact in comparison to data, requires a lot of overlay statistics, especially since many muons are "lost" during reconstruction, selections, trigger cuts, etc. The question is whether one can use GANs to generate new samples based on an original BDSim sample, e.g. events using the BDSim output at a z plane (e.g. CEDAR z=69657 mm). Naively, this would be similar to a parametrisation in x,y, px, py, pz for each particle (muons in this case), but more sophisticated. Such an "extension" of the BDSim dataset with the use of GANs would allow one to effortlessly study multiple configurations of the model since one could relatively quickly generate the necessary statistics without running full BDSim simulations.

Student should be familiar with programming in C/C++, Python. Some knowledge of ROOT would be desirable.

Literature:

  1. Fast simulation of muons produced at the SHiP experiment using Generative Adversarial Networks (SHiP Collaboration, 2019)
  2. Uncertainties associated with GAN-generated datasets in high energy physics, K.T. Marchev, et al. (2021)
  3. A biased MC for muon production for beam-dump experiments, S. Ghinescu, et al. (2021)
  4. Book on GANs: Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, A. Geron, 2019
  5. The Beam and detector of the NA62 experiment at CERN on arXiv
  6. BDSim (Beamline Simulation Tool) documentation
  7. ROOT - an open-source data analysis framework for high energy physics documentation

Progress Tracker

How to use this table: Keep track of things to do by comparing the table below with our email communication. Email me the description (from this table) and a short summary (from your logbook) of any completed task, to mark it as done.

Task description Status
M1. Zoom meeting (July 5) Intro, discussed status and next steps.
T1. Read through articles/documentation and make notes (July 5-20) Literature review more or less done by July 20.
T2. Keep a log book (July 5-) Ongoing
T3. (optional) Install ROOT, have a look at the structure of BDSim converted outputs Done (July 8)
M2. Zoom meeting (July 13, 10:00am) Discussed progress and next steps. Student will continue focusing on literature (GANs, NA62).
T4. Investigate the applicability and uses of GANs in other physics experiments, especially the use of GANs to extrapolate statistics. Investigate also alternative approaches/algorithms. Done (see comments from M3).
M3. Zoom meeting (July 20, 10:00am) Read eight papers on GANs, found nothing to rule out the use on GANs for our type of application. Investigated also alternate approaches, and comparison methods. Next step would be to think about implementation.
T5. Read about PyROOT Student read about Python use with ROOT, including uproot4.
M4. Zoom meeting (August 3, 10:00am) Excellent progress, first implementation ran on own laptop, will arrange access to PPE resources. Next step, implement GAN for (x, y, Px, Py, Pz) and one single particle type, run on a PPE VM.
T6. Implement GAN for (x, y, Px, Py, Pz) and one single particle type. In progress, using Jupyter notebooks on a PPE VM. The GAN training runs, however results visualisation is not implemented yet.
M5. Zoom meeting (August 10, 10:00am) Good progress. Next steps: get the histogram working to see the distribution of generated muons, and start tracking the loss function to see when training has progressed far enough. Depending on results, this would lead onto changing the parameters of the GAN and actually using a figure of merit for its evaluation.
M6. Zoom meeting (August 31, 10:00am) Excellent progress. Implemented GANs for 5-dim case (x, y, Px, Py, Pz), tested various NN parameters (learning_rate, beta, batch_size) and found optimum (run 2). Code, plots and notes uploaded online. See example here. Will download more BDSim for NN training.
T7. Re-train model(s) on 10×-100× more events. Trained the pretrained the GAN on 140x more statistics. It appears the distributions oscillate around instead of settling on the target distributions, and the distribution tails continue to be underestimated.
T8. Implement a figure-of-merit to quantify the GAN performance. Implemented gradient boosted decision trees as a figure of merit (xgboost).
M7. Zoom meeting (August 14, 11:55am) Excellent progress. Implemented GANs for anti-muons, possibly solved the convergence issue; need to train (~48h) on a larger set to confirm. Posted detailed documentation on Google Colab.
A1. Project presentation: Sep 24 The talk can be found here. Plots and code can be found here.

Page last updated on September 30, 2021