Influence of a-posteriori subcell limiting on fault frequency in higher-order DG schemes

Anne Reinarz, Jean Mathieu Gallard, Michael Bader

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Soft error rates are increasing as modern architectures require increasingly small features at low voltages. Due to the large number of components used in HPC architectures, these are particularly vulnerable to soft errors. Hence, when designing applications that run for long time periods on large machines, algorithmic resilience must be taken into account. In this paper we analyse the inherent resiliency of a-posteriori limiting procedures in the context of the explicit ADER DG hyperbolic PDE solver ExaHyPE. The a-posteriori limiter checks element-local high-order DG solutions for physical admissibility, and can thus be expected to also detect hardware-induced errors. Algorithmically, it can be interpreted as element-local checkpointing and restarting of the solver with a more robust finite volume scheme on a fine subgrid. We show that the limiter indeed increases the resilience of the DG algorithm, detecting and correcting particularly those faults which would otherwise lead to a fatal failure.

Original languageEnglish
Title of host publicationProceedings of FTXS 2018
Subtitle of host publication8th Workshop on Fault Tolerance for HPC at eXtreme Scale, Held in conjunction with SC18: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages79-86
Number of pages8
ISBN (Electronic)9781728102221
DOIs
StatePublished - 5 Dec 2018
Event8th IEEE/ACM Workshop on Fault Tolerance for HPC at eXtreme Scale, FTXS 2018 - Dallas, United States
Duration: 11 Nov 201816 Nov 2018

Publication series

NameProceedings of FTXS 2018: 8th Workshop on Fault Tolerance for HPC at eXtreme Scale, Held in conjunction with SC18: The International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference8th IEEE/ACM Workshop on Fault Tolerance for HPC at eXtreme Scale, FTXS 2018
Country/TerritoryUnited States
CityDallas
Period11/11/1816/11/18

Keywords

  • numerical-methods
  • reliability
  • soft-errors

Fingerprint

Dive into the research topics of 'Influence of a-posteriori subcell limiting on fault frequency in higher-order DG schemes'. Together they form a unique fingerprint.

Cite this