Abstract
The aerodynamics simulation code based on the lattice Boltzmann method (LBM) using forest-of-octrees-based block-structured adaptive mesh refinement (AMR) with temporary-fixed refinement was implemented, and its performance was evaluated on GPU-based supercomputers. Although the Space-Filling-Curve-based (SFC) domain partitioning algorithm for the octree-based AMR has been widely used on conventional CPU-based supercomputers, accelerated computation on GPU-based supercomputers revealed a bottleneck due to costly halo data communication. Our new tree cutting approach adopts a hybrid domain partitioning with the coarse structured block decomposition and the SFC partitioning in each block. This hybrid approach improved the locality and the topology of the partitioned sub-domains and reduced the amount of the halo communication to one-third of the original SFC approach. In the strong scaling test, the code achieved maximum ×1.82 speedup at the performance of 2207 MLUPS (mega-lattice update per second) on 128 GPUs (NVIDIA® Tesla® V100). In the weak scaling test, the code achieved 9620 MLUPS at 128 GPUs with 4.473 billion grid points, while keeping the parallel efficiency of 93.4% from 8 to 128 GPUs.
Original language | English |
---|---|
Article number | 102851 |
Journal | Parallel Computing |
Volume | 108 |
DOIs | |
State | Published - Dec 2021 |
Externally published | Yes |
Keywords
- Adaptive mesh refinement (AMR)
- GPU
- Lattice Boltzmann method
- Static AMR