TY - GEN
T1 - REFINE
T2 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017
AU - Georgakoudis, Giorgis
AU - Laguna, Ignacio
AU - Nikolopoulos, Dimitrios S.
AU - Schulz, Martin
N1 - Publisher Copyright:
© 2017 Association for Computing Machinery.
PY - 2017/11/12
Y1 - 2017/11/12
N2 - Compiler-based fault injection (FI) has become a popular technique for resilience studies to understand the impact of soft errors in supercomputing systems. Compiler-based FI frameworks inject faults at a high intermediate-representation level. However, they are less accurate than machine code, binary-level FI because they lack access to all dynamic instructions, thus they fail to mimic certain fault manifestations. In this paper, we study the limitations of current practices in compiler-based FI and how they impact the interpretation of results in resilience studies. We propose REFINE, a novel framework that addresses these limitations, performing FI in a compiler backend. Our approach provides the portability and efficiency of compiler-based FI, while keeping accuracy comparable to binary-level FI methods. We demonstrate our approach in 14 HPC programs and show that, due to our unique design, its runtime overhead is significantly smaller than state-of-the-art compiler-based FI frameworks, reducing the time for large FI experiments.
AB - Compiler-based fault injection (FI) has become a popular technique for resilience studies to understand the impact of soft errors in supercomputing systems. Compiler-based FI frameworks inject faults at a high intermediate-representation level. However, they are less accurate than machine code, binary-level FI because they lack access to all dynamic instructions, thus they fail to mimic certain fault manifestations. In this paper, we study the limitations of current practices in compiler-based FI and how they impact the interpretation of results in resilience studies. We propose REFINE, a novel framework that addresses these limitations, performing FI in a compiler backend. Our approach provides the portability and efficiency of compiler-based FI, while keeping accuracy comparable to binary-level FI methods. We demonstrate our approach in 14 HPC programs and show that, due to our unique design, its runtime overhead is significantly smaller than state-of-the-art compiler-based FI frameworks, reducing the time for large FI experiments.
KW - Compiler-based instrumentation
KW - Fault injection
KW - High-performance computing
KW - Resilience
UR - http://www.scopus.com/inward/record.url?scp=85040179834&partnerID=8YFLogxK
U2 - 10.1145/3126908.3126972
DO - 10.1145/3126908.3126972
M3 - Conference contribution
AN - SCOPUS:85040179834
T3 - Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017
BT - Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017
PB - Association for Computing Machinery, Inc
Y2 - 12 November 2017 through 17 November 2017
ER -