On the achievable speeds of finite difference solvers on CPUs and GPUs

Rainald Löhner, Andrew Corrigan, Karl Robert Wichmann, Wolfgang Wall

Research output: Contribution to conferencePaperpeer-review

18 Scopus citations

Abstract

A Finite Difference code for the weakly compressible Navier-Stokes equations has been developed. The code was then ported to the graphical processing unit (GPU) using the automatic FORTRAN to CUDA translator F2CUDA. Detailed analysis revealed that the original, 'chunky' single loop over the points resulted in an excessive number of registers that the GPU could not handle. The RHS loop was then split according to dimensions, and fluxes were computed 'on the fly' in order to minimize the number of registers. The final code, although not as transparent and tidy as the original, led to the expected performance on the GPU. The timing studies carried out revealed that at present the performance on both CPU and GPU hardware is dominated by memory transfer rates. Without accounting for any floating point operations (FLOPS), the theoretically achievable speeds based on the memory transfer hardware ratings of both the CPU and the GPU are within a factor of 1.5 of the timings obtained.

Original languageEnglish
DOIs
StatePublished - 2013
Event21st AIAA Computational Fluid Dynamics Conference - San Diego, CA, United States
Duration: 24 Jun 201327 Jun 2013

Conference

Conference21st AIAA Computational Fluid Dynamics Conference
Country/TerritoryUnited States
CitySan Diego, CA
Period24/06/1327/06/13

Fingerprint

Dive into the research topics of 'On the achievable speeds of finite difference solvers on CPUs and GPUs'. Together they form a unique fingerprint.

Cite this