Performance tuning on GPU for granular simulation based on discrete element method saving memory usage

Seiya Watanabe, Takayuki Aoki, Satori Tsuzuki

Research output: Contribution to journalArticlepeer-review

Abstract

For granular particle simulations based on Discrete Element Method (DEM), we have a severe problem of its long computational time consuming. Using GPU (Graphics Processing Unit) is one of options to accelerate the computation with the high performance of floating- point operation. Since the amount of on-board high-speed memory on GPU cards is limited to several GB, we have to choose algorithms to save the memory usage. Four kinds of speed- up techniques have been proposed: a highly efficient method for neighbor-particle searching, sorting the particle order on their positions, an efficient memory usage for the tangential spring, and fusion of GPU kernel function to reduce the memory access. A benchmark test of the 3- dimentional dam-breaking problem is examined to evaluate their performances and their memory usages for four techniques, respectively. The computational performance of the code which all the four techniques are applied to is improved 14.86 times higher than the original one, and only 6% increase of the memory usage is required. It is shown that the four speed-up techniques are quite available for GPU computing to achieve higher performance and less memory usage for DEM computation. We have also demonstrated a large-scale dam-breaking test using 15,728,640 particles on a NVIDIA Tesla K20X and the simulation has completed within 5.5 hours.

Original languageEnglish
JournalTransactions of the Japan Society for Computational Engineering and Science
Volume2016
DOIs
StatePublished - 27 May 2016
Externally publishedYes

Keywords

  • Discrete Element Method
  • GPU
  • High-performance computing
  • Neighbor-particle searching method
  • Particle sorting

Fingerprint

Dive into the research topics of 'Performance tuning on GPU for granular simulation based on discrete element method saving memory usage'. Together they form a unique fingerprint.

Cite this