TY - JOUR
T1 - Software Testing With Large Language Models
T2 - Survey, Landscape, and Vision
AU - Wang, Junjie
AU - Huang, Yuchao
AU - Chen, Chunyang
AU - Liu, Zhe
AU - Wang, Song
AU - Wang, Qing
N1 - Publisher Copyright:
© 1976-2012 IEEE.
PY - 2024/4/1
Y1 - 2024/4/1
N2 - Pre-trained large language models (LLMs) have recently emerged as a breakthrough technology in natural language processing and artificial intelligence, with the ability to handle large-scale datasets and exhibit remarkable performance across a wide range of tasks. Meanwhile, software testing is a crucial undertaking that serves as a cornerstone for ensuring the quality and reliability of software products. As the scope and complexity of software systems continue to grow, the need for more effective software testing techniques becomes increasingly urgent, making it an area ripe for innovative approaches such as the use of LLMs. This paper provides a comprehensive review of the utilization of LLMs in software testing. It analyzes 102 relevant studies that have used LLMs for software testing, from both the software testing and LLMs perspectives. The paper presents a detailed discussion of the software testing tasks for which LLMs are commonly used, among which test case preparation and program repair are the most representative. It also analyzes the commonly used LLMs, the types of prompt engineering that are employed, as well as the accompanied techniques with these LLMs. It also summarizes the key challenges and potential opportunities in this direction. This work can serve as a roadmap for future research in this area, highlighting potential avenues for exploration, and identifying gaps in our current understanding of the use of LLMs in software testing.
AB - Pre-trained large language models (LLMs) have recently emerged as a breakthrough technology in natural language processing and artificial intelligence, with the ability to handle large-scale datasets and exhibit remarkable performance across a wide range of tasks. Meanwhile, software testing is a crucial undertaking that serves as a cornerstone for ensuring the quality and reliability of software products. As the scope and complexity of software systems continue to grow, the need for more effective software testing techniques becomes increasingly urgent, making it an area ripe for innovative approaches such as the use of LLMs. This paper provides a comprehensive review of the utilization of LLMs in software testing. It analyzes 102 relevant studies that have used LLMs for software testing, from both the software testing and LLMs perspectives. The paper presents a detailed discussion of the software testing tasks for which LLMs are commonly used, among which test case preparation and program repair are the most representative. It also analyzes the commonly used LLMs, the types of prompt engineering that are employed, as well as the accompanied techniques with these LLMs. It also summarizes the key challenges and potential opportunities in this direction. This work can serve as a roadmap for future research in this area, highlighting potential avenues for exploration, and identifying gaps in our current understanding of the use of LLMs in software testing.
KW - GPT
KW - LLM
KW - Pre-trained large language model
KW - software testing
UR - http://www.scopus.com/inward/record.url?scp=85187981851&partnerID=8YFLogxK
U2 - 10.1109/TSE.2024.3368208
DO - 10.1109/TSE.2024.3368208
M3 - Article
AN - SCOPUS:85187981851
SN - 0098-5589
VL - 50
SP - 911
EP - 936
JO - IEEE Transactions on Software Engineering
JF - IEEE Transactions on Software Engineering
IS - 4
ER -