TY - GEN
T1 - Reconstruction-error-based learning for continuous emotion recognition in speech
AU - Han, Jing
AU - Zhang, Zixing
AU - Ringeval, Fabien
AU - Schuller, Bjorn
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/6/16
Y1 - 2017/6/16
N2 - To advance the performance of continuous emotion recognition from speech, we introduce a reconstruction-error-based (RE-based) learning framework with memory-enhanced Recurrent Neural Networks (RNN). In the framework, two successive RNN models are adopted, where the first model is used as an autoencoder for reconstructing the original features, and the second is employed to perform emotion prediction. The RE of the original features is used as a complementary descriptor, which is merged with the original features and fed to the second model. The assumption of this framework is that the system has the ability to learn its 'drawback' which is expressed by the RE. Experimental results on the RECOLA database show that the proposed framework significantly outperforms the baseline systems without any RE information in terms of Concordance Correlation Coefficient (.729 vs.710 for arousal,.360 vs.237 for valence), and also significantly overcomes other state-of-the-art methods.
AB - To advance the performance of continuous emotion recognition from speech, we introduce a reconstruction-error-based (RE-based) learning framework with memory-enhanced Recurrent Neural Networks (RNN). In the framework, two successive RNN models are adopted, where the first model is used as an autoencoder for reconstructing the original features, and the second is employed to perform emotion prediction. The RE of the original features is used as a complementary descriptor, which is merged with the original features and fed to the second model. The assumption of this framework is that the system has the ability to learn its 'drawback' which is expressed by the RE. Experimental results on the RECOLA database show that the proposed framework significantly outperforms the baseline systems without any RE information in terms of Concordance Correlation Coefficient (.729 vs.710 for arousal,.360 vs.237 for valence), and also significantly overcomes other state-of-the-art methods.
KW - Continuous emotion recognition
KW - bidirectional long short-term memory
KW - reconstruction error
UR - http://www.scopus.com/inward/record.url?scp=85023738689&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2017.7952580
DO - 10.1109/ICASSP.2017.7952580
M3 - Conference contribution
AN - SCOPUS:85023738689
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 2367
EP - 2371
BT - 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017
Y2 - 5 March 2017 through 9 March 2017
ER -