TY - GEN
T1 - Defining a Software Maintainability Dataset
T2 - 36th IEEE International Conference on Software Maintenance and Evolution, ICSME 2020
AU - Schnappinger, Markus
AU - Fietzke, Arnaud
AU - Pretschner, Alexander
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/9
Y1 - 2020/9
N2 - Before controlling the quality of software systems, we need to assess it. In the case of maintainability, this often happens with manual expert reviews. Current automatic approaches have received criticism because their results often do not reflect the opinion of experts or are biased towards a small group of experts. We use the judgments of a significantly larger expert group to create a robust maintainability dataset. In a large scale survey, 70 professionals assessed code from 9 open and closed source Java projects with a combined size of 1.4 million source lines of code. The assessment covers an overall judgment as well as an assessment of several subdimensions of maintainability. Among these subdimensions, we present evidence that understandability is valued the most by the experts. Our analysis also reveals that disagreement between evaluators occurs frequently. Significant dissent was detected in 17% of the cases. To overcome these differences, we present a method to determine a consensus, i.e. the most probable true label. The resulting dataset contains the consensus of the experts for more than 500 Java classes. This corpus can be used to learn precise and practical classifiers for software maintainability.
AB - Before controlling the quality of software systems, we need to assess it. In the case of maintainability, this often happens with manual expert reviews. Current automatic approaches have received criticism because their results often do not reflect the opinion of experts or are biased towards a small group of experts. We use the judgments of a significantly larger expert group to create a robust maintainability dataset. In a large scale survey, 70 professionals assessed code from 9 open and closed source Java projects with a combined size of 1.4 million source lines of code. The assessment covers an overall judgment as well as an assessment of several subdimensions of maintainability. Among these subdimensions, we present evidence that understandability is valued the most by the experts. Our analysis also reveals that disagreement between evaluators occurs frequently. Significant dissent was detected in 17% of the cases. To overcome these differences, we present a method to determine a consensus, i.e. the most probable true label. The resulting dataset contains the consensus of the experts for more than 500 Java classes. This corpus can be used to learn precise and practical classifiers for software maintainability.
KW - Machine Learning
KW - Software Maintenance
KW - Software Measurement
KW - Software Quality
UR - http://www.scopus.com/inward/record.url?scp=85096681920&partnerID=8YFLogxK
U2 - 10.1109/ICSME46990.2020.00035
DO - 10.1109/ICSME46990.2020.00035
M3 - Conference contribution
AN - SCOPUS:85096681920
T3 - Proceedings - 2020 IEEE International Conference on Software Maintenance and Evolution, ICSME 2020
SP - 278
EP - 289
BT - Proceedings - 2020 IEEE International Conference on Software Maintenance and Evolution, ICSME 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 27 September 2020 through 3 October 2020
ER -