Title
Big Data Tools and Cloud Services for High Energy Physics Analysis in TOTEM Experiment
Abstract
The High Energy Physics community has been developing dedicated solutions for processing experiment data over decades. However, with recent advancements in Big Data and Cloud Services, a question of application of such technologies in the domain of physics data analysis becomes relevant. In this paper, we present our initial experience with a system that combines the use of public cloud infrastructure (Helix Nebula Science Cloud), storage and processing services developed by CERN, and off-the-shelf Big Data frameworks. The system is completely decoupled from CERN main computing facilities and provides an interactive web-based interface based on Jupyter Notebooks as the main entry-point for the users. We run a sample analysis on 4.7 TB of data from the TOTEM experiment, rewriting the analysis code to leverage the PyRoot and RDataFrame model and to take full advantage of the parallel processing capabilities offered by Apache Spark. We report on the experience collected by embracing this new analysis model: preliminary scalability results show the processing time of our dataset can be reduced from 13 hrs on a single core to 7 mins on 248 cores.
Year
DOI
Venue
2018
10.1109/UCC-Companion.2018.00018
2018 IEEE/ACM International Conference on Utility and Cloud Computing Companion (UCC Companion)
Keywords
Field
DocType
Apache Spark,High Energy Physics,Data Analysis,Scalability
Single-core,Large Hadron Collider,Totem,Spark (mathematics),Rewriting,Big data,Particle physics,Cloud computing,Scalability
Conference
ISSN
ISBN
Citations 
2373-6860
978-1-7281-0360-0
0
PageRank 
References 
Authors
0.34
1
17