TY - JOUR
T1 - EmoBed
T2 - Strengthening Monomodal Emotion Recognition via Training with Crossmodal Emotion Embeddings
AU - Han, Jing
AU - Zhang, Zixing
AU - Ren, Zhao
AU - Schuller, Bjorn
N1 - Publisher Copyright:
© 2010-2012 IEEE.
PY - 2021/7/1
Y1 - 2021/7/1
N2 - Despite remarkable advances in emotion recognition, they are severely restrained from either the essentially limited property of the employed single modality, or the synchronous presence of all involved multiple modalities. Motivated by this, we propose a novel crossmodal emotion embedding framework called EmoBed, which aims to leverage the knowledge from other auxiliary modalities to improve the performance of an emotion recognition system at hand. The framework generally includes two main learning components, i.e., joint multimodal training and crossmodal training. Both of them tend to explore the underlying semantic emotion information but with a shared recognition network or with a shared emotion embedding space, respectively. In doing this, the enhanced system trained with this approach can efficiently make use of the complementary information from other modalities. Nevertheless, the presence of these auxiliary modalities is not demanded during inference. To empirically investigate the effectiveness and robustness of the proposed framework, we perform extensive experiments on the two benchmark databases RECOLA and OMG-Emotion for the tasks of dimensional emotion regression and categorical emotion classification, respectively. The obtained results show that the proposed framework significantly outperforms related baselines in monomodal inference, and are also competitive or superior to the recently reported systems, which emphasises the importance of the proposed crossmodal learning for emotion recognition.
AB - Despite remarkable advances in emotion recognition, they are severely restrained from either the essentially limited property of the employed single modality, or the synchronous presence of all involved multiple modalities. Motivated by this, we propose a novel crossmodal emotion embedding framework called EmoBed, which aims to leverage the knowledge from other auxiliary modalities to improve the performance of an emotion recognition system at hand. The framework generally includes two main learning components, i.e., joint multimodal training and crossmodal training. Both of them tend to explore the underlying semantic emotion information but with a shared recognition network or with a shared emotion embedding space, respectively. In doing this, the enhanced system trained with this approach can efficiently make use of the complementary information from other modalities. Nevertheless, the presence of these auxiliary modalities is not demanded during inference. To empirically investigate the effectiveness and robustness of the proposed framework, we perform extensive experiments on the two benchmark databases RECOLA and OMG-Emotion for the tasks of dimensional emotion regression and categorical emotion classification, respectively. The obtained results show that the proposed framework significantly outperforms related baselines in monomodal inference, and are also competitive or superior to the recently reported systems, which emphasises the importance of the proposed crossmodal learning for emotion recognition.
KW - Crossmodal learning
KW - emotion embedding
KW - emotion recognition
KW - joint training
KW - triplet loss
UR - http://www.scopus.com/inward/record.url?scp=85069931651&partnerID=8YFLogxK
U2 - 10.1109/TAFFC.2019.2928297
DO - 10.1109/TAFFC.2019.2928297
M3 - Article
AN - SCOPUS:85069931651
SN - 1949-3045
VL - 12
SP - 553
EP - 564
JO - IEEE Transactions on Affective Computing
JF - IEEE Transactions on Affective Computing
IS - 3
M1 - 8762142
ER -