TY - JOUR
T1 - Model vs system level testing of autonomous driving systems
T2 - a replication and extension study
AU - Stocco, Andrea
AU - Pulfer, Brian
AU - Tonella, Paolo
N1 - Publisher Copyright:
© 2023, The Author(s).
PY - 2023/5
Y1 - 2023/5
N2 - Offline model-level testing of autonomous driving software is much cheaper, faster, and diversified than in-field, online system-level testing. Hence, researchers have compared empirically model-level vs system-level testing using driving simulators. They reported the general usefulness of simulators at reproducing the same conditions experienced in-field, but also some inadequacy of model-level testing at exposing failures that are observable only in online mode. In this work, we replicate the reference study on model vs system-level testing of autonomous vehicles while acknowledging several assumptions that we had reconsidered. These assumptions are related to several threats to validity affecting the original study that motivated additional analysis and the development of techniques to mitigate them. Moreover, we also extend the replicated study by evaluating the original findings when considering a physical, radio-controlled autonomous vehicle. Our results show that simulator-based testing of autonomous driving systems yields predictions that are close to the ones of real-world datasets when using neural-based translation to mitigate the reality gap induced by the simulation platform. On the other hand, model-level testing failures are in line with those experienced at the system level, both in simulated and physical environments, when considering the pre-failure site, similar-looking images, and accurate labels.
AB - Offline model-level testing of autonomous driving software is much cheaper, faster, and diversified than in-field, online system-level testing. Hence, researchers have compared empirically model-level vs system-level testing using driving simulators. They reported the general usefulness of simulators at reproducing the same conditions experienced in-field, but also some inadequacy of model-level testing at exposing failures that are observable only in online mode. In this work, we replicate the reference study on model vs system-level testing of autonomous vehicles while acknowledging several assumptions that we had reconsidered. These assumptions are related to several threats to validity affecting the original study that motivated additional analysis and the development of techniques to mitigate them. Moreover, we also extend the replicated study by evaluating the original findings when considering a physical, radio-controlled autonomous vehicle. Our results show that simulator-based testing of autonomous driving systems yields predictions that are close to the ones of real-world datasets when using neural-based translation to mitigate the reality gap induced by the simulation platform. On the other hand, model-level testing failures are in line with those experienced at the system level, both in simulated and physical environments, when considering the pre-failure site, similar-looking images, and accurate labels.
KW - Autonomous driving
KW - DNN testing
KW - Deep neural networks
KW - Model testing
KW - System testing
UR - http://www.scopus.com/inward/record.url?scp=85159854748&partnerID=8YFLogxK
U2 - 10.1007/s10664-023-10306-x
DO - 10.1007/s10664-023-10306-x
M3 - Article
AN - SCOPUS:85159854748
SN - 1382-3256
VL - 28
JO - Empirical Software Engineering
JF - Empirical Software Engineering
IS - 3
M1 - 73
ER -