Practical applicability of optimizations and performance models to complex stencil-based loop kernels in CFD

Karl Robert Wichmann, Martin Kronbichler, Rainald Löhner, Wolfgang A. Wall

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

This work investigates the application and interaction of optimization techniques and performance models in a computational fluid dynamics (CFD) approach employing an OpenMP parallelized, explicit, weakly compressible, finite difference–based solver for the incompressible Navier–Stokes equations using a five-point wide stencil. The presented loop and stencil optimizations lead to a 6.8× increase in per core throughput. In order to verify optimal CPU utilization, performance models are applied to the tuned code. Three different performance models are considered: a roofline-based model, utilizing purely theoretical figures, one which is enhanced by measurements, and the execution cache memory model. It is shown that the models provide reliable estimates for simple benchmarks, such as seven-point stencils for scalar Laplacians, but the estimate quality is significantly worse for the complex and tuned stencil. While it is possible to include even more details in the model, it eventually leads to a state in which it purely reproduces the benchmarks from which it was derived. Thus, the applied general-purpose performance models are found to inaccurately predict the actual performance. They overestimate the achievable performance by more than about 97% for highly tuned code. Through further code tuning, 66% of the predicted performance could be achieved.

Original languageEnglish
Pages (from-to)602-618
Number of pages17
JournalInternational Journal of High Performance Computing Applications
Volume33
Issue number4
DOIs
StatePublished - 1 Jul 2019

Keywords

  • Performance modeling
  • finite difference
  • performance optimization
  • stencil

Fingerprint

Dive into the research topics of 'Practical applicability of optimizations and performance models to complex stencil-based loop kernels in CFD'. Together they form a unique fingerprint.

Cite this