TY - JOUR
T1 - Performance tuning on GPU for granular simulation based on discrete element method saving memory usage
AU - Watanabe, Seiya
AU - Aoki, Takayuki
AU - Tsuzuki, Satori
N1 - Publisher Copyright:
© 2016 by the Japan Society for Computational Engineering and Science.
PY - 2016/5/27
Y1 - 2016/5/27
N2 - For granular particle simulations based on Discrete Element Method (DEM), we have a severe problem of its long computational time consuming. Using GPU (Graphics Processing Unit) is one of options to accelerate the computation with the high performance of floating- point operation. Since the amount of on-board high-speed memory on GPU cards is limited to several GB, we have to choose algorithms to save the memory usage. Four kinds of speed- up techniques have been proposed: a highly efficient method for neighbor-particle searching, sorting the particle order on their positions, an efficient memory usage for the tangential spring, and fusion of GPU kernel function to reduce the memory access. A benchmark test of the 3- dimentional dam-breaking problem is examined to evaluate their performances and their memory usages for four techniques, respectively. The computational performance of the code which all the four techniques are applied to is improved 14.86 times higher than the original one, and only 6% increase of the memory usage is required. It is shown that the four speed-up techniques are quite available for GPU computing to achieve higher performance and less memory usage for DEM computation. We have also demonstrated a large-scale dam-breaking test using 15,728,640 particles on a NVIDIA Tesla K20X and the simulation has completed within 5.5 hours.
AB - For granular particle simulations based on Discrete Element Method (DEM), we have a severe problem of its long computational time consuming. Using GPU (Graphics Processing Unit) is one of options to accelerate the computation with the high performance of floating- point operation. Since the amount of on-board high-speed memory on GPU cards is limited to several GB, we have to choose algorithms to save the memory usage. Four kinds of speed- up techniques have been proposed: a highly efficient method for neighbor-particle searching, sorting the particle order on their positions, an efficient memory usage for the tangential spring, and fusion of GPU kernel function to reduce the memory access. A benchmark test of the 3- dimentional dam-breaking problem is examined to evaluate their performances and their memory usages for four techniques, respectively. The computational performance of the code which all the four techniques are applied to is improved 14.86 times higher than the original one, and only 6% increase of the memory usage is required. It is shown that the four speed-up techniques are quite available for GPU computing to achieve higher performance and less memory usage for DEM computation. We have also demonstrated a large-scale dam-breaking test using 15,728,640 particles on a NVIDIA Tesla K20X and the simulation has completed within 5.5 hours.
KW - Discrete Element Method
KW - GPU
KW - High-performance computing
KW - Neighbor-particle searching method
KW - Particle sorting
UR - http://www.scopus.com/inward/record.url?scp=84970005827&partnerID=8YFLogxK
U2 - 10.11421/jsces.2016.20160013
DO - 10.11421/jsces.2016.20160013
M3 - Article
AN - SCOPUS:84970005827
SN - 1344-9443
VL - 2016
JO - Transactions of the Japan Society for Computational Engineering and Science
JF - Transactions of the Japan Society for Computational Engineering and Science
ER -