TY - CHAP
T1 - Exahd
T2 - A massively parallel fault tolerant sparse grid approach for high-dimensional turbulent plasma simulations
AU - Lago, Rafael
AU - Obersteiner, Michael
AU - Pollinger, Theresa
AU - Rentrop, Johannes
AU - Bungartz, Hans Joachim
AU - Dannert, Tilman
AU - Griebel, Michael
AU - Jenko, Frank
AU - Pflüger, Dirk
N1 - Publisher Copyright:
© The Author(s) 2020.
PY - 2020
Y1 - 2020
N2 - Plasma fusion is one of the promising candidates for an emission-free energy source and is heavily investigated with high-resolution numerical simulations. Unfortunately, these simulations suffer from the curse of dimensionality due to the five-plus-one-dimensional nature of the equations. Hence, we propose a sparse grid approach based on the sparse grid combination technique which splits the simulation grid into multiple smaller grids of varying resolution. This enables us to increase the maximum resolution as well as the parallel efficiency of the current solvers. At the same time we introduce fault tolerance within the algorithmic design and increase the resilience of the application code. We base our implementation on a manager-worker approach which computes multiple solver runs in parallel by distributing tasks to different process groups. Our results demonstrate good convergence for linear fusion runs and show high parallel efficiency up to 180k cores. In addition, our framework achieves accurate results with low overhead in faulty environments. Moreover, for nonlinear fusion runs, we show the effectiveness of the combination technique and discuss existing shortcomings that are still under investigation.
AB - Plasma fusion is one of the promising candidates for an emission-free energy source and is heavily investigated with high-resolution numerical simulations. Unfortunately, these simulations suffer from the curse of dimensionality due to the five-plus-one-dimensional nature of the equations. Hence, we propose a sparse grid approach based on the sparse grid combination technique which splits the simulation grid into multiple smaller grids of varying resolution. This enables us to increase the maximum resolution as well as the parallel efficiency of the current solvers. At the same time we introduce fault tolerance within the algorithmic design and increase the resilience of the application code. We base our implementation on a manager-worker approach which computes multiple solver runs in parallel by distributing tasks to different process groups. Our results demonstrate good convergence for linear fusion runs and show high parallel efficiency up to 180k cores. In addition, our framework achieves accurate results with low overhead in faulty environments. Moreover, for nonlinear fusion runs, we show the effectiveness of the combination technique and discuss existing shortcomings that are still under investigation.
UR - http://www.scopus.com/inward/record.url?scp=85089618004&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-47956-5_11
DO - 10.1007/978-3-030-47956-5_11
M3 - Chapter
AN - SCOPUS:85089618004
T3 - Lecture Notes in Computational Science and Engineering
SP - 301
EP - 329
BT - Lecture Notes in Computational Science and Engineering
PB - Springer
ER -