Empirically evaluating readily available information for regression test optimization in continuous integration

Daniel Elsner, Florian Hauer, Alexander Pretschner, Silke Reimer

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

28 Scopus citations

Abstract

Regression test selection (RTS) and prioritization (RTP) techniques aim to reduce testing efforts and developer feedback time after a change to the code base. Using various information sources, including test traces, build dependencies, version control data, and test histories, they have been shown to be effective. However, not all of these sources are guaranteed to be available and accessible for arbitrary continuous integration (CI) environments. In contrast, metadata from version control systems (VCSs) and CI systems are readily available and inexpensive. Yet, corresponding RTP and RTS techniques are scattered across research and often only evaluated on synthetic faults or in a specific industrial context. It is cumbersome for practitioners to identify insights that apply to their context, let alone to calibrate associated parameters for maximum cost-effectiveness. This paper consolidates existing work on RTP and unsafe RTS into an actionable methodology to build and evaluate such approaches that exclusively rely on CI and VCS metadata. To investigate how these approaches from prior research compare in heterogeneous settings, we apply the methodology in a large-scale empirical study on a set of 23 projects covering 37,000 CI logs and 76,000 VCS commits. We find that these approaches significantly outperform established RTP baselines and, while still triggering 90% of the failures, we show that practitioners can expect to save on average 84% of test execution time for unsafe RTS. We also find that it can be beneficial to limit training data, features from test history work better than change-based features, and, somewhat surprisingly, simple and well-known heuristics often outperform complex machine-learned models.

Original languageEnglish
Title of host publicationISSTA 2021 - Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis
EditorsCristian Cadar, Xiangyu Zhang
PublisherAssociation for Computing Machinery, Inc
Pages491-504
Number of pages14
ISBN (Electronic)9781450384599
DOIs
StatePublished - 11 Jul 2021
Event30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2021 - Virtual, Online, Denmark
Duration: 11 Jul 202117 Jul 2021

Publication series

NameISSTA 2021 - Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Conference

Conference30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2021
Country/TerritoryDenmark
CityVirtual, Online
Period11/07/2117/07/21

Keywords

  • Machine learning
  • Regression test optimization
  • Software testing

Fingerprint

Dive into the research topics of 'Empirically evaluating readily available information for regression test optimization in continuous integration'. Together they form a unique fingerprint.

Cite this