TY - JOUR
T1 - Recognizing Emotions from Whispered Speech Based on Acoustic Feature Transfer Learning
AU - Deng, Jun
AU - Fruhholz, Sascha
AU - Zhang, Zixing
AU - Schuller, Bjorn
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017
Y1 - 2017
N2 - Whispered speech, as an alternative speaking style for normal phonated (non-whispered) speech, has received little attention in speech emotion recognition. Currently, speech emotion recognition systems are exclusively designed to process normal phonated speech and can result in significantly degraded performance on whispered speech because of the fundamental differences between normal phonated speech and whispered speech in vocal excitation and vocal tract function. This study, motivated by the recent successes of feature transfer learning, sheds some light on this topic by proposing three feature transfer learning methods based on denoising autoencoders, shared-hidden-layer autoencoders, and extreme learning machines autoencoders. Without the availability of labeled whispered speech data in the training phase, in turn, the three proposed methods can help modern emotion recognition models trained on normal phonated speech to reliably handle also whispered speech. Throughout extensive experiments on the Geneva Whispered Emotion Corpus and the Berlin Emotional Speech Database, we compare our methods to alternative methods reported to perform well for a wide range of speech emotion recognition tasks and find that the proposed methods provide significant superior performance on both normal phonated and whispered speech.
AB - Whispered speech, as an alternative speaking style for normal phonated (non-whispered) speech, has received little attention in speech emotion recognition. Currently, speech emotion recognition systems are exclusively designed to process normal phonated speech and can result in significantly degraded performance on whispered speech because of the fundamental differences between normal phonated speech and whispered speech in vocal excitation and vocal tract function. This study, motivated by the recent successes of feature transfer learning, sheds some light on this topic by proposing three feature transfer learning methods based on denoising autoencoders, shared-hidden-layer autoencoders, and extreme learning machines autoencoders. Without the availability of labeled whispered speech data in the training phase, in turn, the three proposed methods can help modern emotion recognition models trained on normal phonated speech to reliably handle also whispered speech. Throughout extensive experiments on the Geneva Whispered Emotion Corpus and the Berlin Emotional Speech Database, we compare our methods to alternative methods reported to perform well for a wide range of speech emotion recognition tasks and find that the proposed methods provide significant superior performance on both normal phonated and whispered speech.
KW - Speech emotion recognition
KW - autoencoders
KW - extreme learning machines
KW - feature transfer learning
KW - whispered speech
UR - http://www.scopus.com/inward/record.url?scp=85028028347&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2017.2672722
DO - 10.1109/ACCESS.2017.2672722
M3 - Article
AN - SCOPUS:85028028347
SN - 2169-3536
VL - 5
SP - 5235
EP - 5246
JO - IEEE Access
JF - IEEE Access
M1 - 7879177
ER -