TY - GEN
T1 - On the Convergence of Malleability and the HPC PowerStack
T2 - 37th International Conference on High Performance Computing , ISC High Performance 2022
AU - Arima, Eishi
AU - Comprés, A. Isaías
AU - Schulz, Martin
N1 - Publisher Copyright:
© 2022, Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Recent High-Performance Computing (HPC) systems are facing important challenges, such as massive power consumption, while at the same time significantly under-utilized system resources. Given the power consumption trends, future systems will be deployed in an over-provisioned manner where more resources are installed than they can afford to power simultaneously. In such a scenario, maximizing resource utilization and energy efficiency, while keeping a given power constraint, is pivotal. Driven by this observation, in this position paper we first highlight the recent trends of resource management techniques, with a particular focus on malleability support (i.e., dynamically scaling resource allocations/requirements for a job), co-scheduling (i.e., co-locating multiple jobs within a node), and power management. Second, we consider putting them together, assess their relationships/synergies, and discuss the functionality requirements in each software component for future over-provisioned and power-constrained HPC systems. Third, we briefly introduce our ongoing efforts on the integration of software tools, which will ultimately lead to the convergence of malleability and power management, as it is designed in the HPC PowerStack initiative.
AB - Recent High-Performance Computing (HPC) systems are facing important challenges, such as massive power consumption, while at the same time significantly under-utilized system resources. Given the power consumption trends, future systems will be deployed in an over-provisioned manner where more resources are installed than they can afford to power simultaneously. In such a scenario, maximizing resource utilization and energy efficiency, while keeping a given power constraint, is pivotal. Driven by this observation, in this position paper we first highlight the recent trends of resource management techniques, with a particular focus on malleability support (i.e., dynamically scaling resource allocations/requirements for a job), co-scheduling (i.e., co-locating multiple jobs within a node), and power management. Second, we consider putting them together, assess their relationships/synergies, and discuss the functionality requirements in each software component for future over-provisioned and power-constrained HPC systems. Third, we briefly introduce our ongoing efforts on the integration of software tools, which will ultimately lead to the convergence of malleability and power management, as it is designed in the HPC PowerStack initiative.
KW - Co-scheduling
KW - Dynamic resource management
KW - Heterogeneity
KW - Malleability
KW - Over-provisioning
KW - Power management
UR - http://www.scopus.com/inward/record.url?scp=85148684545&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-23220-6_14
DO - 10.1007/978-3-031-23220-6_14
M3 - Conference contribution
AN - SCOPUS:85148684545
SN - 9783031232190
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 206
EP - 217
BT - High Performance Computing. ISC High Performance 2022 International Workshops - Revised Selected Papers
A2 - Anzt, Hartwig
A2 - Bienz, Amanda
A2 - Luszczek, Piotr
A2 - Baboulin, Marc
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 29 May 2022 through 2 June 2022
ER -