TY - GEN
T1 - Influence of a-posteriori subcell limiting on fault frequency in higher-order DG schemes
AU - Reinarz, Anne
AU - Gallard, Jean Mathieu
AU - Bader, Michael
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/12/5
Y1 - 2018/12/5
N2 - Soft error rates are increasing as modern architectures require increasingly small features at low voltages. Due to the large number of components used in HPC architectures, these are particularly vulnerable to soft errors. Hence, when designing applications that run for long time periods on large machines, algorithmic resilience must be taken into account. In this paper we analyse the inherent resiliency of a-posteriori limiting procedures in the context of the explicit ADER DG hyperbolic PDE solver ExaHyPE. The a-posteriori limiter checks element-local high-order DG solutions for physical admissibility, and can thus be expected to also detect hardware-induced errors. Algorithmically, it can be interpreted as element-local checkpointing and restarting of the solver with a more robust finite volume scheme on a fine subgrid. We show that the limiter indeed increases the resilience of the DG algorithm, detecting and correcting particularly those faults which would otherwise lead to a fatal failure.
AB - Soft error rates are increasing as modern architectures require increasingly small features at low voltages. Due to the large number of components used in HPC architectures, these are particularly vulnerable to soft errors. Hence, when designing applications that run for long time periods on large machines, algorithmic resilience must be taken into account. In this paper we analyse the inherent resiliency of a-posteriori limiting procedures in the context of the explicit ADER DG hyperbolic PDE solver ExaHyPE. The a-posteriori limiter checks element-local high-order DG solutions for physical admissibility, and can thus be expected to also detect hardware-induced errors. Algorithmically, it can be interpreted as element-local checkpointing and restarting of the solver with a more robust finite volume scheme on a fine subgrid. We show that the limiter indeed increases the resilience of the DG algorithm, detecting and correcting particularly those faults which would otherwise lead to a fatal failure.
KW - numerical-methods
KW - reliability
KW - soft-errors
UR - http://www.scopus.com/inward/record.url?scp=85060604563&partnerID=8YFLogxK
U2 - 10.1109/FTXS.2018.00012
DO - 10.1109/FTXS.2018.00012
M3 - Conference contribution
AN - SCOPUS:85060604563
T3 - Proceedings of FTXS 2018: 8th Workshop on Fault Tolerance for HPC at eXtreme Scale, Held in conjunction with SC18: The International Conference for High Performance Computing, Networking, Storage and Analysis
SP - 79
EP - 86
BT - Proceedings of FTXS 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 8th IEEE/ACM Workshop on Fault Tolerance for HPC at eXtreme Scale, FTXS 2018
Y2 - 11 November 2018 through 16 November 2018
ER -