TY - GEN
T1 - Non-Blocking GPU-CPU Notifications to Enable More GPU-CPU Parallelism
AU - Elis, Bengisu
AU - Pearce, Olga
AU - Boehme, David
AU - Burmark, Jason
AU - Schulz, Martin
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2023/10/20
Y1 - 2023/10/20
N2 - GPUs are increasingly popular in HPC systems, and more applications are adopting GPUs each day. However, the control synchronization of GPUs with CPUs is suboptimal and only possible after GPU kernel termination points, resulting in serialized host and device tasks. In this paper, we propose a novel CPU-GPU notification method that enables non-blocking in-kernel control synchronization of device and host tasks in combination with persistent GPU kernels. Using this notification method, we increase the overlap of CPU and GPU execution and with that parallelism. We present the concept and structure of the proposed notification mechanism together with in-kernel GPU-CPU control synchronization, using halo-exchange as an example. We analyze the performance of the halo-exchange pattern using our new notification method, as well as the interference between CPU and GPU operations due to the execution overlap. Finally, we verify our results using a performance model covering the halo-exchange pattern with the new notification method.
AB - GPUs are increasingly popular in HPC systems, and more applications are adopting GPUs each day. However, the control synchronization of GPUs with CPUs is suboptimal and only possible after GPU kernel termination points, resulting in serialized host and device tasks. In this paper, we propose a novel CPU-GPU notification method that enables non-blocking in-kernel control synchronization of device and host tasks in combination with persistent GPU kernels. Using this notification method, we increase the overlap of CPU and GPU execution and with that parallelism. We present the concept and structure of the proposed notification mechanism together with in-kernel GPU-CPU control synchronization, using halo-exchange as an example. We analyze the performance of the halo-exchange pattern using our new notification method, as well as the interference between CPU and GPU operations due to the execution overlap. Finally, we verify our results using a performance model covering the halo-exchange pattern with the new notification method.
KW - GPU
KW - Halo-exchange
KW - MPI
KW - Synchronization
UR - http://www.scopus.com/inward/record.url?scp=85184522554&partnerID=8YFLogxK
U2 - 10.1145/3635035.3635036
DO - 10.1145/3635035.3635036
M3 - Conference contribution
AN - SCOPUS:85184522554
T3 - ACM International Conference Proceeding Series
SP - 1
EP - 11
BT - BDSIC2023 - 2023 5th International Conference on Big-data Service and Intelligent Computation
PB - Association for Computing Machinery
T2 - 5th International Conference on Big-data Service and Intelligent Computation
Y2 - 20 October 2023 through 22 October 2023
ER -