TY - GEN
T1 - Extending SLURM for Dynamic Resource-Aware Adaptive Batch Scheduling
AU - Chadha, Mohak
AU - John, Jophin
AU - Gerndt, Michael
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/12
Y1 - 2020/12
N2 - With the growing constraints on power budget and increasing hardware failure rates, the operation of future exascale systems faces several challenges. Towards this, resource awareness and adaptivity by enabling malleable jobs has been actively researched in the HPC community. Malleable jobs can change their computing resources at runtime and can significantly improve HPC system performance. However, due to the rigid nature of popular parallel programming paradigms such as MPI and lack of support for dynamic resource management in batch systems, malleable jobs have been largely unrealized. In this paper, we extend the SLURM batch system to support the execution and batch scheduling of malleable jobs. The malleable applications are written using a new adaptive parallel paradigm called Invasive MPI which extends the MPI standard to support resource-adaptivity at runtime. We propose two malleable job scheduling strategies to support performance-aware and power-aware dynamic reconfiguration decisions at runtime. We implement the strategies in SLURM and evaluate them on a production HPC system. Results for our performance-aware scheduling strategy show improvements in makespan, average system utilization, average response, and waiting times as compared to other scheduling strategies. Moreover, we demonstrate dynamic power corridor management using our power-aware strategy.
AB - With the growing constraints on power budget and increasing hardware failure rates, the operation of future exascale systems faces several challenges. Towards this, resource awareness and adaptivity by enabling malleable jobs has been actively researched in the HPC community. Malleable jobs can change their computing resources at runtime and can significantly improve HPC system performance. However, due to the rigid nature of popular parallel programming paradigms such as MPI and lack of support for dynamic resource management in batch systems, malleable jobs have been largely unrealized. In this paper, we extend the SLURM batch system to support the execution and batch scheduling of malleable jobs. The malleable applications are written using a new adaptive parallel paradigm called Invasive MPI which extends the MPI standard to support resource-adaptivity at runtime. We propose two malleable job scheduling strategies to support performance-aware and power-aware dynamic reconfiguration decisions at runtime. We implement the strategies in SLURM and evaluate them on a production HPC system. Results for our performance-aware scheduling strategy show improvements in makespan, average system utilization, average response, and waiting times as compared to other scheduling strategies. Moreover, we demonstrate dynamic power corridor management using our power-aware strategy.
KW - Dynamic resource-management
KW - SLURM
KW - malleability
KW - performance-aware
KW - power-aware scheduling
UR - http://www.scopus.com/inward/record.url?scp=85105537471&partnerID=8YFLogxK
U2 - 10.1109/HiPC50609.2020.00036
DO - 10.1109/HiPC50609.2020.00036
M3 - Conference contribution
AN - SCOPUS:85105537471
T3 - Proceedings - 2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics, HiPC 2020
SP - 223
EP - 232
BT - Proceedings - 2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics, HiPC 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 27th IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC 2020
Y2 - 16 December 2020 through 18 December 2020
ER -