TY - GEN
T1 - A Case Study on PMIx-Usage for Dynamic Resource Management
AU - Huber, Dominik
AU - Schreiber, Martin
AU - Schulz, Martin
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - With the increasing scale of HPC supercomputers efficient resource utilization on such systems becomes even more important. In this context, dynamic resource management is a very active research field, as it is expected to improve several metrics of resource utilization on HPC systems, such as job throughput and energy efficiency. However, dynamic resource management is complex and requires significant changes to various layers of the software stack including resource- and process management, programming models and applications. So far, approaches for resource management are often specific to a particular implementation of the resource management and process management software, thus hindering interoperability, composability and comparability of such approaches. In this paper, we discuss the usage of the Process Management Interface - Exascale (PMIx) Standard for interactions between the process manager and the resource manager. We describe an architecture that allows the resource manager to connect to the process manager as PMIx Tool to have access to a set of PMIx services useful for resource management. In a concrete case-study we connect a python- and PMIx-based resource manager to PRRTE and assess the applicability of this architecture for debugging and exploration of dynamic resource management techniques. We conclude that a PMIx-based architecture can simplify the process of exploring new dynamic and disruptive resource management mechanisms while improving composability.
AB - With the increasing scale of HPC supercomputers efficient resource utilization on such systems becomes even more important. In this context, dynamic resource management is a very active research field, as it is expected to improve several metrics of resource utilization on HPC systems, such as job throughput and energy efficiency. However, dynamic resource management is complex and requires significant changes to various layers of the software stack including resource- and process management, programming models and applications. So far, approaches for resource management are often specific to a particular implementation of the resource management and process management software, thus hindering interoperability, composability and comparability of such approaches. In this paper, we discuss the usage of the Process Management Interface - Exascale (PMIx) Standard for interactions between the process manager and the resource manager. We describe an architecture that allows the resource manager to connect to the process manager as PMIx Tool to have access to a set of PMIx services useful for resource management. In a concrete case-study we connect a python- and PMIx-based resource manager to PRRTE and assess the applicability of this architecture for debugging and exploration of dynamic resource management techniques. We conclude that a PMIx-based architecture can simplify the process of exploring new dynamic and disruptive resource management mechanisms while improving composability.
KW - Dynamic Resource Management
KW - PMIx
KW - Process Management
UR - http://www.scopus.com/inward/record.url?scp=85171364126&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-40843-4_4
DO - 10.1007/978-3-031-40843-4_4
M3 - Conference contribution
AN - SCOPUS:85171364126
SN - 9783031408427
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 42
EP - 55
BT - High Performance Computing - ISC High Performance 2023 International Workshops, Revised Selected Papers
A2 - Bienz, Amanda
A2 - Weiland, Michèle
A2 - Baboulin, Marc
A2 - Kruse, Carola
PB - Springer Science and Business Media Deutschland GmbH
T2 - 38th International Conference on High Performance Computing, ISC High Performance 2023
Y2 - 21 May 2023 through 25 May 2023
ER -