TY - GEN
T1 - Towards conditional adversarial training for predicting emotions from speech
AU - Han, Jing
AU - Zhang, Zixing
AU - Ren, Zhao
AU - Ringeval, Fabien
AU - Schuller, Bjorn
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/9/10
Y1 - 2018/9/10
N2 - Motivated by the encouraging results recently obtained by generative adversarial networks in various image processing tasks, we propose a conditional adversarial training framework to predict dimensional representations of emotion, i. e., arousal and valence, from speech signals. The framework consists of two networks, trained in an adversarial manner: The first network tries to predict emotion from acoustic features, while the second network aims at distinguishing between the predictions provided by the first network and the emotion labels from the database using the acoustic features as conditional information. We evaluate the performance of the proposed conditional adversarial training framework on the widely used emotion database RECOLA. Experimental results show that the proposed training strategy outperforms the conventional training method, and is comparable with, or even superior to other recently reported approaches, including deep and end-to-end learning.
AB - Motivated by the encouraging results recently obtained by generative adversarial networks in various image processing tasks, we propose a conditional adversarial training framework to predict dimensional representations of emotion, i. e., arousal and valence, from speech signals. The framework consists of two networks, trained in an adversarial manner: The first network tries to predict emotion from acoustic features, while the second network aims at distinguishing between the predictions provided by the first network and the emotion labels from the database using the acoustic features as conditional information. We evaluate the performance of the proposed conditional adversarial training framework on the widely used emotion database RECOLA. Experimental results show that the proposed training strategy outperforms the conventional training method, and is comparable with, or even superior to other recently reported approaches, including deep and end-to-end learning.
KW - Conditional adversarial training
KW - Emotion recognition
KW - Generative adversarial network
UR - http://www.scopus.com/inward/record.url?scp=85054244540&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2018.8462579
DO - 10.1109/ICASSP.2018.8462579
M3 - Conference contribution
AN - SCOPUS:85054244540
SN - 9781538646588
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 6822
EP - 6826
BT - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
Y2 - 15 April 2018 through 20 April 2018
ER -