Non-Blocking GPU-CPU Notifications to Enable More GPU-CPU Parallelism

Bengisu Elis, Olga Pearce, David Boehme, Jason Burmark, Martin Schulz

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

GPUs are increasingly popular in HPC systems, and more applications are adopting GPUs each day. However, the control synchronization of GPUs with CPUs is suboptimal and only possible after GPU kernel termination points, resulting in serialized host and device tasks. In this paper, we propose a novel CPU-GPU notification method that enables non-blocking in-kernel control synchronization of device and host tasks in combination with persistent GPU kernels. Using this notification method, we increase the overlap of CPU and GPU execution and with that parallelism. We present the concept and structure of the proposed notification mechanism together with in-kernel GPU-CPU control synchronization, using halo-exchange as an example. We analyze the performance of the halo-exchange pattern using our new notification method, as well as the interference between CPU and GPU operations due to the execution overlap. Finally, we verify our results using a performance model covering the halo-exchange pattern with the new notification method.

Original languageEnglish
Title of host publicationProceedings of the International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2024
PublisherAssociation for Computing Machinery
Pages1-11
Number of pages11
ISBN (Electronic)9798400708893
DOIs
StatePublished - 18 Jan 2024
Event7th International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2024 - Nagoya, Japan
Duration: 25 Jan 202427 Jan 2024

Publication series

NameACM International Conference Proceeding Series

Conference

Conference7th International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2024
Country/TerritoryJapan
CityNagoya
Period25/01/2427/01/24

Keywords

  • GPU
  • Halo-exchange
  • MPI
  • Synchronization

Fingerprint

Dive into the research topics of 'Non-Blocking GPU-CPU Notifications to Enable More GPU-CPU Parallelism'. Together they form a unique fingerprint.

Cite this