TY - GEN
T1 - Towards a fault-tolerant, scalable implementation of GENE
AU - Hinojosa, Alfredo Parra
AU - Kowitz, C.
AU - Heene, M.
AU - Pflüger, D.
AU - Bungartz, H. J.
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2015.
PY - 2015
Y1 - 2015
N2 - We consider the HPC challenge of fault tolerance in the context of plasma physics simulations using the sparse grid combination technique. In the combination technique formalism, one breaks down a single, highly expensive simulation into many, considerably cheaper independent simulations that are propagated in time and then combined to approximate the results of the full solution. This introduces a new level of parallelism from which various fault tolerance approaches can be deduced. We investigate two such approaches, corresponding to two different simulation modes of the plasma physics code GENE: the simulation of a time-dependent, 5- dimensional PDE, and the computation of certain eigenvalues of the spectrum of a problem-specific linear operator. This paper has two main contributions to the field of fault tolerance with the combination technique. First, we show that the recently developed fault-tolerant combination technique performs well even for highly complex simulation codes, i.e., beyond the usual Poisson or advection problems; and second, we demonstrate a new way to use of the optimized combination technique (OptiCom) in the context of fault tolerance when dealing with eigenvalue computations. This work is a building block of the project EXAHD within the DFG’s Priority Programme “Software for Exascale Computing” (SPPEXA).
AB - We consider the HPC challenge of fault tolerance in the context of plasma physics simulations using the sparse grid combination technique. In the combination technique formalism, one breaks down a single, highly expensive simulation into many, considerably cheaper independent simulations that are propagated in time and then combined to approximate the results of the full solution. This introduces a new level of parallelism from which various fault tolerance approaches can be deduced. We investigate two such approaches, corresponding to two different simulation modes of the plasma physics code GENE: the simulation of a time-dependent, 5- dimensional PDE, and the computation of certain eigenvalues of the spectrum of a problem-specific linear operator. This paper has two main contributions to the field of fault tolerance with the combination technique. First, we show that the recently developed fault-tolerant combination technique performs well even for highly complex simulation codes, i.e., beyond the usual Poisson or advection problems; and second, we demonstrate a new way to use of the optimized combination technique (OptiCom) in the context of fault tolerance when dealing with eigenvalue computations. This work is a building block of the project EXAHD within the DFG’s Priority Programme “Software for Exascale Computing” (SPPEXA).
UR - http://www.scopus.com/inward/record.url?scp=84951262647&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-22997-3_3
DO - 10.1007/978-3-319-22997-3_3
M3 - Conference contribution
AN - SCOPUS:84951262647
SN - 9783319229966
T3 - Lecture Notes in Computational Science and Engineering
SP - 47
EP - 65
BT - Recent Trends in Computational Engineering - CE2014 - Optimization, Uncertainty, Parallel Algorithms, Coupled and Complex Problems
A2 - Bischoff, Manfred
A2 - Mehl, Miriam
A2 - Schäfer, Michael
PB - Springer Verlag
T2 - 3rd International Workshop on Computational Engineering, CE 2014
Y2 - 6 October 2014 through 10 October 2014
ER -