Fused GEMMs towards an efficient GPU implementation of the ADER-DG method in SeisSol

Ravil Dorozhinskii, Gonzalo Brito Gadeschi, Michael Bader

Publikation: Beitrag in FachzeitschriftArtikelBegutachtung

Abstract

This study shows how GPU performance of the ADER discontinuous Galerkin method in SeisSol (an earthquake simulation software) can be further improved while preserving its original design that ensures high CPU performance. We introduce a new code generator (“ChainForge”) that fuses subsequent batched matrix multiplications (“GEMMs”) into a single GPU kernel, holding intermediate results in shared memory as long as necessary. The generator operates as an external module linked against SeisSol's domain specific language YATeTo and, as a result, the original SeisSol source code remains mainly unchanged. In this paper, we discuss several challenges related to automatic fusion of GPU kernels and provide solutions to them. By and large, we gain (Formula presented.) 60% in performance of SeisSol's wave propagation solver using Fused-GEMMs compared to the original GPU implementation. We demonstrated this on benchmarks as well as on a real production scenario simulating the Northridge 1994 earthquake.

OriginalspracheEnglisch
Aufsatznummere8037
FachzeitschriftConcurrency and Computation: Practice and Experience
Jahrgang36
Ausgabenummer12
DOIs
PublikationsstatusVeröffentlicht - 30 Mai 2024

Fingerprint

Untersuchen Sie die Forschungsthemen von „Fused GEMMs towards an efficient GPU implementation of the ADER-DG method in SeisSol“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren