REFINE: Realistic fault injection via compiler-based instrumentation for accuracy, portability and speed

Giorgis Georgakoudis, Ignacio Laguna, Dimitrios S. Nikolopoulos, Martin Schulz

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

28 Scopus citations

Abstract

Compiler-based fault injection (FI) has become a popular technique for resilience studies to understand the impact of soft errors in supercomputing systems. Compiler-based FI frameworks inject faults at a high intermediate-representation level. However, they are less accurate than machine code, binary-level FI because they lack access to all dynamic instructions, thus they fail to mimic certain fault manifestations. In this paper, we study the limitations of current practices in compiler-based FI and how they impact the interpretation of results in resilience studies. We propose REFINE, a novel framework that addresses these limitations, performing FI in a compiler backend. Our approach provides the portability and efficiency of compiler-based FI, while keeping accuracy comparable to binary-level FI methods. We demonstrate our approach in 14 HPC programs and show that, due to our unique design, its runtime overhead is significantly smaller than state-of-the-art compiler-based FI frameworks, reducing the time for large FI experiments.

Original languageEnglish
Title of host publicationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450351140
DOIs
StatePublished - 12 Nov 2017
EventInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017 - Denver, United States
Duration: 12 Nov 201717 Nov 2017

Publication series

NameProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017

Conference

ConferenceInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017
Country/TerritoryUnited States
CityDenver
Period12/11/1717/11/17

Keywords

  • Compiler-based instrumentation
  • Fault injection
  • High-performance computing
  • Resilience

Fingerprint

Dive into the research topics of 'REFINE: Realistic fault injection via compiler-based instrumentation for accuracy, portability and speed'. Together they form a unique fingerprint.

Cite this