TY - JOUR
T1 - REPAIR
T2 - Control Flow Protection based on Register Pairing Updates for SW-Implemented HW Fault Tolerance
AU - Sharif, Uzair
AU - Mueller-Gritschneder, Daniel
AU - Schlichtmann, Ulf
N1 - Publisher Copyright:
© 2021 Association for Computing Machinery.
PY - 2021/10
Y1 - 2021/10
N2 - Safety-critical embedded systems may either use specialized hardware or rely on Software-Implemented Hardware Fault Tolerance (SIHFT) to meet soft error resilience requirements. SIHFT has the advantage that it can be used with low-cost, off-the-shelf components such as standard Micro-Controller Units. For this, SIHFT methods apply redundancy in software computation and special checker codes to detect transient errors, so called soft errors, that either corrupt the data flow or the control flow of the software and may lead to Silent Data Corruption (SDC). So far, this is done by applying separate SIHFT methods for the data and control flow protection, which leads to large overheads in computation time.This work in contrast presents REPAIR, a method that exploits the checks of the SIHFT data flow protection to also detect control flow errors as well, thereby, yielding higher SDC resilience with less computational overhead. For this, the data flow protection methods entail duplicating the computation with subsequent checks placed strategically throughout the program. These checks assure that the two redundant computation paths, which work on two different parts of the register file, yield the same result. By updating the pairing between the registers used in the primary computation path and the registers in the duplicated computation path using the REPAIR method, these checks also fail with high coverage when a control flow error, which leads to an illegal jumps, occurs. Extensive RTL fault injection simulations are carried out to accurately quantify soft error resilience while evaluating Mibench programs along with an embedded case-study running on an OpenRISC processor. Our method performs slightly better on average in terms of soft error resilience compared to the best state-of-the-art method but requiring significantly lower overheads. These results show that REPAIR is a valuable addition to the set of known SIHFT methods.
AB - Safety-critical embedded systems may either use specialized hardware or rely on Software-Implemented Hardware Fault Tolerance (SIHFT) to meet soft error resilience requirements. SIHFT has the advantage that it can be used with low-cost, off-the-shelf components such as standard Micro-Controller Units. For this, SIHFT methods apply redundancy in software computation and special checker codes to detect transient errors, so called soft errors, that either corrupt the data flow or the control flow of the software and may lead to Silent Data Corruption (SDC). So far, this is done by applying separate SIHFT methods for the data and control flow protection, which leads to large overheads in computation time.This work in contrast presents REPAIR, a method that exploits the checks of the SIHFT data flow protection to also detect control flow errors as well, thereby, yielding higher SDC resilience with less computational overhead. For this, the data flow protection methods entail duplicating the computation with subsequent checks placed strategically throughout the program. These checks assure that the two redundant computation paths, which work on two different parts of the register file, yield the same result. By updating the pairing between the registers used in the primary computation path and the registers in the duplicated computation path using the REPAIR method, these checks also fail with high coverage when a control flow error, which leads to an illegal jumps, occurs. Extensive RTL fault injection simulations are carried out to accurately quantify soft error resilience while evaluating Mibench programs along with an embedded case-study running on an OpenRISC processor. Our method performs slightly better on average in terms of soft error resilience compared to the best state-of-the-art method but requiring significantly lower overheads. These results show that REPAIR is a valuable addition to the set of known SIHFT methods.
KW - Soft errors
KW - code generation
KW - embedded resilience
KW - functional safety
UR - http://www.scopus.com/inward/record.url?scp=85115832286&partnerID=8YFLogxK
U2 - 10.1145/347701
DO - 10.1145/347701
M3 - Article
AN - SCOPUS:85115832286
SN - 1539-9087
VL - 20
JO - ACM Transactions on Embedded Computing Systems
JF - ACM Transactions on Embedded Computing Systems
IS - 5s
M1 - 69
ER -