TY - GEN
T1 - A Preliminary Study on Using Text- and Image-Based Machine Learning to Predict Software Maintainability
AU - Schnappinger, Markus
AU - Zachau, Simon
AU - Fietzke, Arnaud
AU - Pretschner, Alexander
N1 - Publisher Copyright:
© 2022, Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Machine learning has emerged as a useful tool to aid software quality control. It can support identifying problematic code snippets or predicting maintenance efforts. The majority of these frameworks rely on code metrics as input. However, evidence suggests great potential for text- and image-based approaches to predict code quality as well. Using a manually labeled dataset, this preliminary study examines the use of five text- and two image-based algorithms to predict the readability, understandability, and complexity of source code. While the overall performance can still be improved, we find Support Vector Machines (SVM) outperform sophisticated text transformer models and image-based neural networks. Furthermore, text-based SVMs tend to perform well on predicting readability and understandability of code, while image-based SVMs can predict code complexity more accurately. Our study both shows the potential of text- and image-based algorithms for software quality prediction and outlines their weaknesses as a starting point for further research.
AB - Machine learning has emerged as a useful tool to aid software quality control. It can support identifying problematic code snippets or predicting maintenance efforts. The majority of these frameworks rely on code metrics as input. However, evidence suggests great potential for text- and image-based approaches to predict code quality as well. Using a manually labeled dataset, this preliminary study examines the use of five text- and two image-based algorithms to predict the readability, understandability, and complexity of source code. While the overall performance can still be improved, we find Support Vector Machines (SVM) outperform sophisticated text transformer models and image-based neural networks. Furthermore, text-based SVMs tend to perform well on predicting readability and understandability of code, while image-based SVMs can predict code complexity more accurately. Our study both shows the potential of text- and image-based algorithms for software quality prediction and outlines their weaknesses as a starting point for further research.
KW - Expert judgment
KW - Image classification
KW - Machine learning
KW - Maintainability prediction
KW - Software maintainability
KW - Text classification
UR - http://www.scopus.com/inward/record.url?scp=85128969993&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-04115-0_4
DO - 10.1007/978-3-031-04115-0_4
M3 - Conference contribution
AN - SCOPUS:85128969993
SN - 9783031041143
T3 - Lecture Notes in Business Information Processing
SP - 41
EP - 60
BT - Software Quality
A2 - Mendez, Daniel
A2 - Wimmer, Manuel
A2 - Winkler, Dietmar
A2 - Biffl, Stefan
A2 - Bergsmann, Johannes
PB - Springer Science and Business Media Deutschland GmbH
T2 - 14th International Conference on Software Quality, SWQD 2022
Y2 - 17 May 2022 through 19 May 2022
ER -