REFINE: Realistic Fault Injection via Compiler-based Instrumentation for Accuracy, Portability and Speed

Giorgis Georgakoudis, Ignacio Laguna, Dimitrios S. Nikolopoulos, Martin Schulz

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Compiler-based fault injection (FI) has become a popular technique for resilience studies to understand the impact of soft errors in supercomputing systems. Compiler-based FI frameworks inject faults at a high intermediate-representation level. However, they are less accurate than machine code, binary-level FI because they lack access to all dynamic instructions, thus they fail to mimic certain fault manifestations. In this paper, we study the limitations of current practices in compiler-based FI and how they impact the interpretation of results in resilience studies. We propose REFINE, a novel framework that addresses these limitations, performing FI in a compiler backend. Our approach provides the portability and efficiency of compiler-based FI, while keeping accuracy comparable to binary-level FI methods. We demonstrate our approach in 14 HPC programs and show that, due to our unique design, its runtime overhead is significantly smaller than state-of the-art compiler-based FI frameworks, reducing the time for large FI experiments.

Original languageEnglish
Title of host publicationSC 2017 - International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherIEEE Computer Society
ISBN (Electronic)9781450351140
DOIs
StatePublished - 2017
Event2017 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017 - Denver, United States
Duration: 12 Nov 201717 Nov 2017

Publication series

NameInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
Volume2017-November
ISSN (Print)2167-4329
ISSN (Electronic)2167-4337

Conference

Conference2017 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017
Country/TerritoryUnited States
CityDenver
Period12/11/1717/11/17

Keywords

  • Compiler-based Instrumentation
  • Fault Injection
  • High-Performance Computing
  • Resilience

Fingerprint

Dive into the research topics of 'REFINE: Realistic Fault Injection via Compiler-based Instrumentation for Accuracy, Portability and Speed'. Together they form a unique fingerprint.

Cite this