Efficient Performance Estimation and Work-Group Size Pruning for OpenCL Kernels on GPUs

Xiebing Wang, Xuehai Qian, Alois Knoll, Kai Huang

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Graphic Processing Units (GPUs) play a vital role in state-of-the-art high-performance scientific computing realm and research work towards its performance analysis is crucial but nontrivial. Extant GPU performance models are far from practical use, while fine-grained GPU simulation requires a considerably large time cost. Moreover, massive amounts of designs with various program inputs and parameter settings pose a challenge for efficient performance estimation and tuning of parallel GPU applications. To this end, this article presents a hybrid framework for the efficient performance estimation and work-group size pruning of OpenCL workloads on GPUs. The framework contains a static module used to extract the kernel execution trace from the high-level source code and a dynamical module used to mimic the kernel execution flow to estimate the runtime performance. For the design space pruning, an extra analysis is performed to filter out the redundant work-group sizes with duplicated execution traces and inferior pipelines. The proposed framework does not require any program runs to estimate the performance and find the optimal or near-optimal designs. Experiments on four Commercial Off-The-Shelf (COTS) Nvidia GPUs show that the framework can predict the runtime performance with an average error of 17.04 percent and reduce the program design space by an average of 78.47 percent.

Original languageEnglish
Article number8928962
Pages (from-to)1089-1106
Number of pages18
JournalIEEE Transactions on Parallel and Distributed Systems
Volume31
Issue number5
DOIs
StatePublished - 1 May 2020

Keywords

  • GPU
  • OpenCL
  • performance estimation
  • performance tuning
  • work-group size

Fingerprint

Dive into the research topics of 'Efficient Performance Estimation and Work-Group Size Pruning for OpenCL Kernels on GPUs'. Together they form a unique fingerprint.

Cite this