Fused GEMMs towards an efficient GPU implementation of the ADER-DG method in SeisSol

Ravil Dorozhinskii, Gonzalo Brito Gadeschi, Michael Bader

Research output: Contribution to journalArticlepeer-review


This study shows how GPU performance of the ADER discontinuous Galerkin method in SeisSol (an earthquake simulation software) can be further improved while preserving its original design that ensures high CPU performance. We introduce a new code generator (“ChainForge”) that fuses subsequent batched matrix multiplications (“GEMMs”) into a single GPU kernel, holding intermediate results in shared memory as long as necessary. The generator operates as an external module linked against SeisSol's domain specific language YATeTo and, as a result, the original SeisSol source code remains mainly unchanged. In this paper, we discuss several challenges related to automatic fusion of GPU kernels and provide solutions to them. By and large, we gain (Formula presented.) 60% in performance of SeisSol's wave propagation solver using Fused-GEMMs compared to the original GPU implementation. We demonstrated this on benchmarks as well as on a real production scenario simulating the Northridge 1994 earthquake.

Original languageEnglish
Article numbere8037
JournalConcurrency and Computation: Practice and Experience
Issue number12
StatePublished - 30 May 2024


  • code generation
  • discontinuous Galerkin
  • earthquake simulation
  • fusion
  • GEMM
  • GPU
  • SeisSol


Dive into the research topics of 'Fused GEMMs towards an efficient GPU implementation of the ADER-DG method in SeisSol'. Together they form a unique fingerprint.

Cite this