TY - GEN
T1 - Dynamic Resource Management for In-Situ Techniques Using MPI-Sessions
AU - Ju, Yi
AU - Huber, Dominik
AU - Perez, Adalberto
AU - Ulbl, Philipp
AU - Markidis, Stefano
AU - Schlatter, Philipp
AU - Schulz, Martin
AU - Schreiber, Martin
AU - Laure, Erwin
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - The computational power of High-Performance Computing (HPC) systems increases continuously and rapidly. Data-intensive applications are designed to leverage the high computational capacity of HPC resources and typically generate a large amount of data for traditional post-processing data analytics. However, the HPC systems’ in-/output (IO) subsystem develops relatively slowly, and the storage capacity is limited. This could lead to limited actual performance and scientific discovery. In-situ techniques are a partial remedy to these problems by reducing or avoiding the data flow through the IO subsystem to/from the storage. However, in current practice, asynchronous in-situ techniques with static resource management often allocate separate computing resources for executing in-situ task(s), which remain idle if no in-situ work is at hand. In the present work, we target improving the efficiency of computing resource usage by launching and releasing necessary additional computing resources for in-situ task(s). Our approach is based on extensions for MPI Sessions that enable the required dynamic resource management. In this paper, we propose a basic and an advanced in-situ techniques with dynamic resource management enabled by MPI Sessions, their implementations on two real-world use cases, and a critical analysis of the experimental results.
AB - The computational power of High-Performance Computing (HPC) systems increases continuously and rapidly. Data-intensive applications are designed to leverage the high computational capacity of HPC resources and typically generate a large amount of data for traditional post-processing data analytics. However, the HPC systems’ in-/output (IO) subsystem develops relatively slowly, and the storage capacity is limited. This could lead to limited actual performance and scientific discovery. In-situ techniques are a partial remedy to these problems by reducing or avoiding the data flow through the IO subsystem to/from the storage. However, in current practice, asynchronous in-situ techniques with static resource management often allocate separate computing resources for executing in-situ task(s), which remain idle if no in-situ work is at hand. In the present work, we target improving the efficiency of computing resource usage by launching and releasing necessary additional computing resources for in-situ task(s). Our approach is based on extensions for MPI Sessions that enable the required dynamic resource management. In this paper, we propose a basic and an advanced in-situ techniques with dynamic resource management enabled by MPI Sessions, their implementations on two real-world use cases, and a critical analysis of the experimental results.
KW - Dynamic resource management
KW - HPC
KW - In-situ
KW - MPI Session
UR - http://www.scopus.com/inward/record.url?scp=85206070581&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-73370-3_7
DO - 10.1007/978-3-031-73370-3_7
M3 - Conference contribution
AN - SCOPUS:85206070581
SN - 9783031733697
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 105
EP - 120
BT - Recent Advances in the Message Passing Interface - 31st European MPI Users’ Group Meeting, EuroMPI 2024, Proceedings
A2 - Blaas-Schenner, Claudia
A2 - Niethammer, Christoph
A2 - Haas, Tobias
PB - Springer Science and Business Media Deutschland GmbH
T2 - 31st European MPI Users’ Group Meeting, EuroMPI 2024
Y2 - 25 September 2024 through 27 September 2024
ER -