TY - GEN
T1 - Seeking the SuperStar
T2 - 2017 International Joint Conference on Neural Networks, IJCNN 2017
AU - Bohm, Johanna
AU - Eyben, Florian
AU - Schmitt, Maximilian
AU - Kosch, Harald
AU - Schuller, Bjorn
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/6/30
Y1 - 2017/6/30
N2 - The quality of the singing voice is an important aspect of subjective, aesthetic perception of music. In this contribution, we propose a method to automatically assess perceived singing quality. We classify monophonic vocal recordings without accompaniment into one of three classes of singing quality. Unprocessed private and non-commercial recordings from a social media website are utilised. In addition to the user ratings given on the website, we let both subjects with and without a musical background annotate the samples. Building on musicological foundations, we define and extract acoustic parameters describing the quality of the sound, musical expression and intonation of the singing. Besides features which are already established in the field of Music Information Retrieval, such as loudness and mel-frequency cepstral coefficients, we propose and employ new types of features which are specific to intonation. For automatic classification by supervised machine learning methods, models predicting the subjective ratings and the user ratings on the social media website are learnt. We perform an exhaustive evaluation of both different classifiers and combinations of features. We show that the performance of automatic classification is close to that of human evaluators. Utilising support vector machines, an accuracy of classification of 55.4 %, based on the subjective ratings, and of 84.7 %, based on the user ratings of the social media website, are achieved.
AB - The quality of the singing voice is an important aspect of subjective, aesthetic perception of music. In this contribution, we propose a method to automatically assess perceived singing quality. We classify monophonic vocal recordings without accompaniment into one of three classes of singing quality. Unprocessed private and non-commercial recordings from a social media website are utilised. In addition to the user ratings given on the website, we let both subjects with and without a musical background annotate the samples. Building on musicological foundations, we define and extract acoustic parameters describing the quality of the sound, musical expression and intonation of the singing. Besides features which are already established in the field of Music Information Retrieval, such as loudness and mel-frequency cepstral coefficients, we propose and employ new types of features which are specific to intonation. For automatic classification by supervised machine learning methods, models predicting the subjective ratings and the user ratings on the social media website are learnt. We perform an exhaustive evaluation of both different classifiers and combinations of features. We show that the performance of automatic classification is close to that of human evaluators. Utilising support vector machines, an accuracy of classification of 55.4 %, based on the subjective ratings, and of 84.7 %, based on the user ratings of the social media website, are achieved.
UR - http://www.scopus.com/inward/record.url?scp=85031044424&partnerID=8YFLogxK
U2 - 10.1109/IJCNN.2017.7966037
DO - 10.1109/IJCNN.2017.7966037
M3 - Conference contribution
AN - SCOPUS:85031044424
T3 - Proceedings of the International Joint Conference on Neural Networks
SP - 1560
EP - 1569
BT - 2017 International Joint Conference on Neural Networks, IJCNN 2017 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 14 May 2017 through 19 May 2017
ER -