TY - GEN
T1 - Automated query reformulation for efficient search based on query logs from stack overflow
AU - Cao, Kaibo
AU - Chen, Chunyang
AU - Baltes, Sebastian
AU - Treude, Christoph
AU - Chen, Xiang
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/5
Y1 - 2021/5
N2 - As a popular Q&A site for programming, Stack Overflow is a treasure for developers. However, the amount of questions and answers on Stack Overflow make it difficult for developers to efficiently locate the information they are looking for. There are two gaps leading to poor search results: the gap between the user's intention and the textual query, and the semantic gap between the query and the post content. Therefore, developers have to constantly reformulate their queries by correcting misspelled words, adding limitations to certain programming languages or platforms, etc. As query reformulation is tedious for developers, especially for novices, we propose an automated software-specific query reformulation approach based on deep learning. With query logs provided by Stack Overflow, we construct a large-scale query reformulation corpus, including the original queries and corresponding reformulated ones. Our approach trains a Transformer model that can automatically generate candidate reformulated queries when given the user's original query. The evaluation results show that our approach outperforms five state-of-the-art baselines, and achieves a 5.6% to 33.5% boost in terms of ExactMatch and a 4.8% to 14.4% boost in terms of GLEU.
AB - As a popular Q&A site for programming, Stack Overflow is a treasure for developers. However, the amount of questions and answers on Stack Overflow make it difficult for developers to efficiently locate the information they are looking for. There are two gaps leading to poor search results: the gap between the user's intention and the textual query, and the semantic gap between the query and the post content. Therefore, developers have to constantly reformulate their queries by correcting misspelled words, adding limitations to certain programming languages or platforms, etc. As query reformulation is tedious for developers, especially for novices, we propose an automated software-specific query reformulation approach based on deep learning. With query logs provided by Stack Overflow, we construct a large-scale query reformulation corpus, including the original queries and corresponding reformulated ones. Our approach trains a Transformer model that can automatically generate candidate reformulated queries when given the user's original query. The evaluation results show that our approach outperforms five state-of-the-art baselines, and achieves a 5.6% to 33.5% boost in terms of ExactMatch and a 4.8% to 14.4% boost in terms of GLEU.
KW - Data Mining
KW - Deep Learning
KW - Query Logs
KW - Query Reformulation
KW - Stack Overflow
UR - http://www.scopus.com/inward/record.url?scp=85111200797&partnerID=8YFLogxK
U2 - 10.1109/ICSE43902.2021.00116
DO - 10.1109/ICSE43902.2021.00116
M3 - Conference contribution
AN - SCOPUS:85111200797
T3 - Proceedings - International Conference on Software Engineering
SP - 1273
EP - 1285
BT - Proceedings - 2021 IEEE/ACM 43rd International Conference on Software Engineering, ICSE 2021
PB - IEEE Computer Society
T2 - 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021
Y2 - 22 May 2021 through 30 May 2021
ER -