TY - GEN
T1 - Nequivack
T2 - 9th IEEE International Conference on Software Testing, Verification and Validation Workshops, ICSTW 2016
AU - Holling, Dominik
AU - Banescu, Sebastian
AU - Probst, Marco
AU - Petrovska, Ana
AU - Pretschner, Alexander
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/8/1
Y1 - 2016/8/1
N2 - The mutation score is defined as the number of killed mutants divided by the number of non-equivalent mutants. However, whether a mutant is equivalent to the original program is undecidable in general. Thus, even when improving a test suite, a mutant score assessing this test suite may become worse during the development of a system, because of equivalent mutants introduced during mutant creation. This is a fundamental problem. Using static analysis and symbolic execution, we show how to establish non-equivalence or "don't know" among mutants. If the number of don't knows is small, this is a good indicator that a computed mutation score actually reflects its above definition. We can therefore have an increased confidence that mutation score trends correspond to actual improvements of a test suite's quality, and are not overly polluted by equivalent mutants. Using a set of 14 representative unit size programs, we show that for some, but not all, of these programs, the above confidence can indeed be established. We also evaluate the reproducibility, efficiency and effectiveness of our Nequivack tool. Our findings are that reproducibility is completely given. A single mutant analysis can be performed within 3 seconds on average, which is efficient for practical and industrial applications.
AB - The mutation score is defined as the number of killed mutants divided by the number of non-equivalent mutants. However, whether a mutant is equivalent to the original program is undecidable in general. Thus, even when improving a test suite, a mutant score assessing this test suite may become worse during the development of a system, because of equivalent mutants introduced during mutant creation. This is a fundamental problem. Using static analysis and symbolic execution, we show how to establish non-equivalence or "don't know" among mutants. If the number of don't knows is small, this is a good indicator that a computed mutation score actually reflects its above definition. We can therefore have an increased confidence that mutation score trends correspond to actual improvements of a test suite's quality, and are not overly polluted by equivalent mutants. Using a set of 14 representative unit size programs, we show that for some, but not all, of these programs, the above confidence can indeed be established. We also evaluate the reproducibility, efficiency and effectiveness of our Nequivack tool. Our findings are that reproducibility is completely given. A single mutant analysis can be performed within 3 seconds on average, which is efficient for practical and industrial applications.
KW - equivalent mutant
KW - mutant score confidence
KW - mutation score
KW - non-equivalence checking
UR - http://www.scopus.com/inward/record.url?scp=84992188057&partnerID=8YFLogxK
U2 - 10.1109/ICSTW.2016.29
DO - 10.1109/ICSTW.2016.29
M3 - Conference contribution
AN - SCOPUS:84992188057
T3 - Proceedings - 2016 IEEE International Conference on Software Testing, Verification and Validation Workshops, ICSTW 2016
SP - 152
EP - 161
BT - Proceedings - 2016 IEEE International Conference on Software Testing, Verification and Validation Workshops, ICSTW 2016
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 10 April 2016 through 15 April 2016
ER -