TY - GEN
T1 - Cost of Flaky Tests in Continuous Integration
T2 - 17th IEEE Conference on Software Testing, Verification and Validation, ICST 2024
AU - Leinen, Fabian
AU - Elsner, Daniel
AU - Pretschner, Alexander
AU - Stahlbauer, Andreas
AU - Sailer, Michael
AU - Jurgens, Elmar
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Researchers and practitioners alike increasingly often perceive flaky tests as a major challenge in software engineering. They spend a lot of effort trying to detect, repair, and mitigate the negative effects of flaky tests. However, it is yet unclear where and to what extent the costs of flaky tests manifest in industrial Continuous Integration (CI) development processes. In this study, we compile cost factors introduced by flaky tests in CI development from research and practice and derive a cost model that allows gaining insight into the costs incurred. We then instantiate this model in a case study of a large, commercial software project with 30 developers and 1M SLoC. We analyze five years of development history, including CI test logs, commits from the Version Control System (VCS), issue tickets, and tracked work time to quantify the cost factors implied by flaky tests. We find that the time spent dealing with flaky tests in the studied project represents at least 2.5% of the productive developer time. This effort is divided into investigating potentially flaky test failures, which accounts for 1.1% of the total time spent, repairing flaky tests adds another 1.3 %, and developing tools to monitor flaky tests adds 0.1 %. Contrary to most other studies, we find the cost for rerunning tests to be negligible and inexpensive. Automatically rerunning a test costs 0.02 cents, while not rerunning and thus letting the pipeline fail results in a manual investigation costing 5.67 in our context. The insights gained from our case study have led to the decision to shift effort from investigation and repair to automatically rerunning tests. Our cost model can help practitioners analyze the cost of flaky tests in their context and make informed decisions. Furthermore, our case study provides a first step to better understand the costs of flaky tests, which can lead researchers to industry-relevant problems.
AB - Researchers and practitioners alike increasingly often perceive flaky tests as a major challenge in software engineering. They spend a lot of effort trying to detect, repair, and mitigate the negative effects of flaky tests. However, it is yet unclear where and to what extent the costs of flaky tests manifest in industrial Continuous Integration (CI) development processes. In this study, we compile cost factors introduced by flaky tests in CI development from research and practice and derive a cost model that allows gaining insight into the costs incurred. We then instantiate this model in a case study of a large, commercial software project with 30 developers and 1M SLoC. We analyze five years of development history, including CI test logs, commits from the Version Control System (VCS), issue tickets, and tracked work time to quantify the cost factors implied by flaky tests. We find that the time spent dealing with flaky tests in the studied project represents at least 2.5% of the productive developer time. This effort is divided into investigating potentially flaky test failures, which accounts for 1.1% of the total time spent, repairing flaky tests adds another 1.3 %, and developing tools to monitor flaky tests adds 0.1 %. Contrary to most other studies, we find the cost for rerunning tests to be negligible and inexpensive. Automatically rerunning a test costs 0.02 cents, while not rerunning and thus letting the pipeline fail results in a manual investigation costing 5.67 in our context. The insights gained from our case study have led to the decision to shift effort from investigation and repair to automatically rerunning tests. Our cost model can help practitioners analyze the cost of flaky tests in their context and make informed decisions. Furthermore, our case study provides a first step to better understand the costs of flaky tests, which can lead researchers to industry-relevant problems.
KW - continuous integration
KW - cost modeling
KW - flaky tests
KW - industrial case study
KW - regression testing
UR - http://www.scopus.com/inward/record.url?scp=85202184751&partnerID=8YFLogxK
U2 - 10.1109/ICST60714.2024.00037
DO - 10.1109/ICST60714.2024.00037
M3 - Conference contribution
AN - SCOPUS:85202184751
T3 - Proceedings - 2024 IEEE Conference on Software Testing, Verification and Validation, ICST 2024
SP - 329
EP - 340
BT - Proceedings - 2024 IEEE Conference on Software Testing, Verification and Validation, ICST 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 27 May 2024 through 31 May 2024
ER -