GANs for BDSim - Student Project (S1 2021/22)

Description: The NA62 experiment at CERN is searching for rare Kaon decay modes to reveal new physics, and probe the limits of the Standard Model. Monte Carlo simulations are an indispensable part for the success of any physics experiment, from the initial design of the experiment all the way to the analysis of the obtained data where they are used to calculate acceptances. It is thus very important that the physics, the model and responses of the detector, and all contributing details are accurately implemented in the simulation software. Rigorous comparisons between experimental data and Monte Carlo (MC) simulations are required, a process called MC Validation.

The NA62 MC simulations use BDSim for muon halo overlay. In order to study changes in the model and the impact in comparison to data, the MC Validation requires a large amount of overlay statistics, especially since many muons are "lost" during reconstruction, selections, trigger cuts, etc. The question is whether one can use GANs to generate new samples based on an original BDSim sample, e.g. events using the BDSim output at a z plane (e.g. CEDAR z=69657 mm). Extending the BDSim dataset with the use of GANs would allow one to study multiple configurations of the model quicker, since one could generate the necessary statistics much faster than by running BDSim.

This project will build upon work done during this summer project, focusing primarily on improving the GAN performance.

Student should be familiar with programming in C/C++, Python. Some knowledge of ROOT would be desirable.

Literature:

  1. Fast simulation of muons produced at the SHiP experiment using Generative Adversarial Networks (SHiP Collaboration, 2019)
  2. Uncertainties associated with GAN-generated datasets in high energy physics, K.T. Marchev, et al. (2021)
  3. GANplifying Event Samples, A. Butter, et al. (2021)
  4. Book: Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, A. Geron, 2019
  5. The Beam and detector of the NA62 experiment at CERN on arXiv
  6. BDSim (Beamline Simulation Tool) documentation
  7. ROOT - an open-source data analysis framework for high energy physics documentation

Progress Tracker

How to use this table: Keep track of things to do by comparing the table below with our email communication. Email me the description (from this table) and a short summary (from your logbook) of any completed task, to mark it as done.

Task description Status
T1. Read project background literature, and make notes. Prepare questions. -
T2. Keep a log book for the length of the project. It will be valuable input for your report and presentation. -
M1. Zoom meeting - Oct 7, 11:05am Intro
T3. Familiarise yourself with the work done in Part 1 of this project, read this presentation, and then study the code and read the howtos. The first (and crucial!) step is to be able to rerun the code and reproduce previous results. Ran test with default GAN on 02/11. See also T5.
M2. Zoom meeting - Oct 18, 12:00am Discussed progress, code structure, showed demo on running python/code on ppevm07
T4. Read about ROOT [7], uproot4 and the linux command 'screen' (type 'man screen' on ppevm07 to learn how to use it). Done.
M3. Zoom meeting - Nov 2, 11:00am Discussed reading/progress status, example script, use of 'screen' command, next tasks
T5. Rerun the code while tweaking various functions and/or parameters (e.g. number of nodes/layers in the NNs, activation functions, number of epochs, etc.), with the goal of improving performance. Consult literature to inform this task. In progress. Tested other NN geometries, activation functions, and loss functions. Plots can be found here.
T7. Implement a switch from {Px, Py, Pz} to spherical coodinates {P, Theta, Phi}. Implement reading/saving data from/to numpy files. -
R1. Interim report Submitted on 05/11 via Moodle
M4. Zoom meeting - Nov 9, 11:00am Discussed progress, next steps. Better planning is needed in order to make best use of (remaining) time and resources.
M5. Zoom meeting - Nov 16, 12:00pm Discussed progress and next steps. Student focused mainly on T5 (see comments above).
M6. Zoom meeting - Nov 23, 11:00am Discussed progress and next steps. Continuing with T5: wrap up loss function, batch size, learning rate and number of neuraon tests. Investigate changes in the number of hideen layers, other types of GAN (e.g. WGAN). Also looking into T7.
M7. Zoom meeting - Dec 2, 11:00am Date moved from 30/11 to 02/12 at student's request. Good progress on T5, working on WGAN implementation.
A1. Oral presentation Tuesday 11th January, 2022. Talk can be found on Moodle
A2. Project report Deadline: 5pm on Wednesday 19th January, 2022.

Page last updated on December 2, 2021