TY - JOUR
T1 - Scalable ATLAS pMSSM computational workflows using containerised REANA reusable analysis platform
AU - Donadoni, Marco
AU - Feickert, Matthew
AU - Heinrich, Lukas
AU - Liu, Yang
AU - Mečionis, Audrius
AU - Moisieienkov, Vladyslav
AU - Šimko, Tibor
AU - Stark, Giordon
AU - García, Marco Vidal
N1 - Publisher Copyright:
© The Authors, published by EDP Sciences.
PY - 2024/5/6
Y1 - 2024/5/6
N2 - In this paper we describe the development of a streamlined framework for large-scale ATLAS pMSSM reinterpretations of LHC Run-2 analyses using containerised computational workflows. The project is looking to assess the global coverage of BSM physics and requires running O(5k) computational workflows representing pMSSM model points. Following ATLAS Analysis Preservation policies, many analyses have been preserved as containerised Yadage workflows, and after validation were added to a curated selection for the pMSSM study. To run the workflows at scale, we utilised the REANA reusable analysis platform. We describe how the REANA platform was enhanced to ensure the best concurrent throughput by internal service scheduling changes. We discuss the scalability of the approach on Kubernetes clusters from 500 to 5000 cores. Finally, we demonstrate a possibility of using additional ad-hoc public cloud infrastructure resources by running the same workflows on the Google Cloud Platform.
AB - In this paper we describe the development of a streamlined framework for large-scale ATLAS pMSSM reinterpretations of LHC Run-2 analyses using containerised computational workflows. The project is looking to assess the global coverage of BSM physics and requires running O(5k) computational workflows representing pMSSM model points. Following ATLAS Analysis Preservation policies, many analyses have been preserved as containerised Yadage workflows, and after validation were added to a curated selection for the pMSSM study. To run the workflows at scale, we utilised the REANA reusable analysis platform. We describe how the REANA platform was enhanced to ensure the best concurrent throughput by internal service scheduling changes. We discuss the scalability of the approach on Kubernetes clusters from 500 to 5000 cores. Finally, we demonstrate a possibility of using additional ad-hoc public cloud infrastructure resources by running the same workflows on the Google Cloud Platform.
UR - http://www.scopus.com/inward/record.url?scp=85212205790&partnerID=8YFLogxK
U2 - 10.1051/epjconf/202429504035
DO - 10.1051/epjconf/202429504035
M3 - Conference article
AN - SCOPUS:85212205790
SN - 2101-6275
VL - 295
JO - EPJ Web of Conferences
JF - EPJ Web of Conferences
M1 - 04035
T2 - 26th International Conference on Computing in High Energy and Nuclear Physics, CHEP 2023
Y2 - 8 May 2023 through 12 May 2023
ER -