TY - GEN
T1 - Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs under Power Caps
AU - Arima, Eishi
AU - Kang, Minjoon
AU - Saba, Issa
AU - Weidendorfer, Josef
AU - Trinitis, Carsten
AU - Schulz, Martin
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/8/29
Y1 - 2022/8/29
N2 - CPU-GPU heterogeneous systems are now commonly used in HPC (High-Performance Computing). However, improving the utilization and energy-efficiency of such systems is still one of the most critical issues. As one single program typically cannot fully utilize all resources within a node/chip, co-scheduling (or co-locating) multiple programs with complementary resource requirements is a promising solution. Meanwhile, as power consumption has become the first-class design constraint for HPC systems, such co-scheduling techniques should be well-Tailored for power-constrained environments. To this end, the industry recently started supporting hardware-level resource partitioning features on modern GPUs for realizing efficient co-scheduling, which can operate with existing power capping features. For example, NVidia's MIG (Multi-Instance GPU) partitions one single GPU into multiple instances at the granularity of a GPC (Graphics Processing Cluster). In this paper, we explicitly target the combination of hardware-level GPU partitioning features and power capping for power-constrained HPC systems. We provide a systematic methodology to optimize the combination of chip partitioning, job allocations, as well as power capping based on our scalability/interference modeling while taking a variety of aspects into account, such as compute/memory intensity and utilization in heterogeneous computational resources (e.g., Tensor Cores). The experimental result indicates that our approach is successful in selecting a near optimal combination across multiple different workloads.
AB - CPU-GPU heterogeneous systems are now commonly used in HPC (High-Performance Computing). However, improving the utilization and energy-efficiency of such systems is still one of the most critical issues. As one single program typically cannot fully utilize all resources within a node/chip, co-scheduling (or co-locating) multiple programs with complementary resource requirements is a promising solution. Meanwhile, as power consumption has become the first-class design constraint for HPC systems, such co-scheduling techniques should be well-Tailored for power-constrained environments. To this end, the industry recently started supporting hardware-level resource partitioning features on modern GPUs for realizing efficient co-scheduling, which can operate with existing power capping features. For example, NVidia's MIG (Multi-Instance GPU) partitions one single GPU into multiple instances at the granularity of a GPC (Graphics Processing Cluster). In this paper, we explicitly target the combination of hardware-level GPU partitioning features and power capping for power-constrained HPC systems. We provide a systematic methodology to optimize the combination of chip partitioning, job allocations, as well as power capping based on our scalability/interference modeling while taking a variety of aspects into account, such as compute/memory intensity and utilization in heterogeneous computational resources (e.g., Tensor Cores). The experimental result indicates that our approach is successful in selecting a near optimal combination across multiple different workloads.
KW - Co-Scheduling
KW - GPUs
KW - MIG
KW - Power Capping
UR - http://www.scopus.com/inward/record.url?scp=85147442342&partnerID=8YFLogxK
U2 - 10.1145/3547276.3548630
DO - 10.1145/3547276.3548630
M3 - Conference contribution
AN - SCOPUS:85147442342
T3 - ACM International Conference Proceeding Series
BT - 51st International Conference on Parallel Processing, ICPP 2022 - Workshop Proceedings
PB - Association for Computing Machinery
T2 - 51st International Conference on Parallel Processing, ICPP 2022
Y2 - 29 August 2022 through 1 September 2022
ER -