TY - GEN
T1 - Analyzing resource trade-offs in hardware overprovisioned supercomputers
AU - Sakamoto, Ryuichi
AU - Patki, Tapasya
AU - Cao, Thang
AU - Kondo, Masaaki
AU - Inoue, Koji
AU - Ueda, Masatsugu
AU - Ellsworth, Daniel
AU - Rountree, Barry
AU - Schulz, Martin
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/8/3
Y1 - 2018/8/3
N2 - Hardware overprovisioned systems have recently been proposed as a viable alternative for a power-efficient design of next-generation supercomputers. A key challenge for such systems is to determine the degree of overprovisioning, which refers to the number of extra nodes that need to be installed under a given power constraint. In this paper, we first show that the degree of overprovisioning depends on dynamic parameters, such as the job mix as well as the global power constraint, and that static decisions can result in limited system throughput. We then study an exhaustive combination of adaptive resource management strategies that span three job scheduling algorithms, four power capping techniques, and three node boot-up mechanisms to understand the trade-off space involved. We then draw conclusions about how these strategies can adaptively control the degree of overprovisioning and analyze their impact on job throughput and power utilization.
AB - Hardware overprovisioned systems have recently been proposed as a viable alternative for a power-efficient design of next-generation supercomputers. A key challenge for such systems is to determine the degree of overprovisioning, which refers to the number of extra nodes that need to be installed under a given power constraint. In this paper, we first show that the degree of overprovisioning depends on dynamic parameters, such as the job mix as well as the global power constraint, and that static decisions can result in limited system throughput. We then study an exhaustive combination of adaptive resource management strategies that span three job scheduling algorithms, four power capping techniques, and three node boot-up mechanisms to understand the trade-off space involved. We then draw conclusions about how these strategies can adaptively control the degree of overprovisioning and analyze their impact on job throughput and power utilization.
KW - Overprovisioned
KW - Power characteristics of HPC system
KW - Power-constrained HPC system
UR - http://www.scopus.com/inward/record.url?scp=85052198530&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2018.00062
DO - 10.1109/IPDPS.2018.00062
M3 - Conference contribution
AN - SCOPUS:85052198530
SN - 9781538643686
T3 - Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018
SP - 526
EP - 535
BT - Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 32nd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018
Y2 - 21 May 2018 through 25 May 2018
ER -