Towards a fault-tolerant, scalable implementation of GENE

Alfredo Parra Hinojosa, C. Kowitz, M. Heene, D. Pflüger, H. J. Bungartz

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

We consider the HPC challenge of fault tolerance in the context of plasma physics simulations using the sparse grid combination technique. In the combination technique formalism, one breaks down a single, highly expensive simulation into many, considerably cheaper independent simulations that are propagated in time and then combined to approximate the results of the full solution. This introduces a new level of parallelism from which various fault tolerance approaches can be deduced. We investigate two such approaches, corresponding to two different simulation modes of the plasma physics code GENE: the simulation of a time-dependent, 5- dimensional PDE, and the computation of certain eigenvalues of the spectrum of a problem-specific linear operator. This paper has two main contributions to the field of fault tolerance with the combination technique. First, we show that the recently developed fault-tolerant combination technique performs well even for highly complex simulation codes, i.e., beyond the usual Poisson or advection problems; and second, we demonstrate a new way to use of the optimized combination technique (OptiCom) in the context of fault tolerance when dealing with eigenvalue computations. This work is a building block of the project EXAHD within the DFG’s Priority Programme “Software for Exascale Computing” (SPPEXA).

Original languageEnglish
Title of host publicationRecent Trends in Computational Engineering - CE2014 - Optimization, Uncertainty, Parallel Algorithms, Coupled and Complex Problems
EditorsManfred Bischoff, Miriam Mehl, Michael Schäfer
PublisherSpringer Verlag
Pages47-65
Number of pages19
ISBN (Print)9783319229966
DOIs
StatePublished - 2015
Event3rd International Workshop on Computational Engineering, CE 2014 - Stuttgart, Germany
Duration: 6 Oct 201410 Oct 2014

Publication series

NameLecture Notes in Computational Science and Engineering
Volume105
ISSN (Print)1439-7358

Conference

Conference3rd International Workshop on Computational Engineering, CE 2014
Country/TerritoryGermany
CityStuttgart
Period6/10/1410/10/14

Fingerprint

Dive into the research topics of 'Towards a fault-tolerant, scalable implementation of GENE'. Together they form a unique fingerprint.

Cite this