TY - GEN
T1 - GPUscout
T2 - 2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023
AU - Sen, Soumya
AU - Vanecek, Stepan
AU - Schulz, Martin
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/11/12
Y1 - 2023/11/12
N2 - GPUs pose an attractive opportunity for delivering high-performance applications. However, GPU codes are often limited due to memory contention, resulting in overall performance degradation. Since GPU scheduling is transparent to the user, and GPU memory architectures are very complex compared to ones on CPUs, finding such bottlenecks is a very cumbersome process. In this paper, we present a novel method of systematically detecting the root cause of frequent memory performance bottlenecks on NVIDIA GPUs that we call GPUscout. It connects three approaches to analyzing performance - static CUDA SASS code analysis, sampling warp stalls, and kernel performance metrics. Connecting these approaches, GPUscout can identify the problem, locate the code segment where it originates, and assess its importance. This paper illustrates the capabilities and the design of our implementation of GPUscout. We show its applicability based on three commonly-used kernels, yielding promising results in terms of accuracy, efficiency, and usability.
AB - GPUs pose an attractive opportunity for delivering high-performance applications. However, GPU codes are often limited due to memory contention, resulting in overall performance degradation. Since GPU scheduling is transparent to the user, and GPU memory architectures are very complex compared to ones on CPUs, finding such bottlenecks is a very cumbersome process. In this paper, we present a novel method of systematically detecting the root cause of frequent memory performance bottlenecks on NVIDIA GPUs that we call GPUscout. It connects three approaches to analyzing performance - static CUDA SASS code analysis, sampling warp stalls, and kernel performance metrics. Connecting these approaches, GPUscout can identify the problem, locate the code segment where it originates, and assess its importance. This paper illustrates the capabilities and the design of our implementation of GPUscout. We show its applicability based on three commonly-used kernels, yielding promising results in terms of accuracy, efficiency, and usability.
KW - CUDA
KW - Data-movement
KW - GPU
KW - High performance computing
KW - NVIDIA
KW - Performance analysis
KW - Profiler
KW - SASS
UR - http://www.scopus.com/inward/record.url?scp=85178149053&partnerID=8YFLogxK
U2 - 10.1145/3624062.3624208
DO - 10.1145/3624062.3624208
M3 - Conference contribution
AN - SCOPUS:85178149053
T3 - ACM International Conference Proceeding Series
SP - 1392
EP - 1402
BT - Proceedings of 2023 SC Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023
PB - Association for Computing Machinery
Y2 - 12 November 2023 through 17 November 2023
ER -