EmoBed: Strengthening Monomodal Emotion Recognition via Training with Crossmodal Emotion Embeddings

Jing Han, Zixing Zhang, Zhao Ren, Bjorn Schuller

Research output: Contribution to journalArticlepeer-review

40 Scopus citations

Abstract

Despite remarkable advances in emotion recognition, they are severely restrained from either the essentially limited property of the employed single modality, or the synchronous presence of all involved multiple modalities. Motivated by this, we propose a novel crossmodal emotion embedding framework called EmoBed, which aims to leverage the knowledge from other auxiliary modalities to improve the performance of an emotion recognition system at hand. The framework generally includes two main learning components, i.e., joint multimodal training and crossmodal training. Both of them tend to explore the underlying semantic emotion information but with a shared recognition network or with a shared emotion embedding space, respectively. In doing this, the enhanced system trained with this approach can efficiently make use of the complementary information from other modalities. Nevertheless, the presence of these auxiliary modalities is not demanded during inference. To empirically investigate the effectiveness and robustness of the proposed framework, we perform extensive experiments on the two benchmark databases RECOLA and OMG-Emotion for the tasks of dimensional emotion regression and categorical emotion classification, respectively. The obtained results show that the proposed framework significantly outperforms related baselines in monomodal inference, and are also competitive or superior to the recently reported systems, which emphasises the importance of the proposed crossmodal learning for emotion recognition.

Original languageEnglish
Article number8762142
Pages (from-to)553-564
Number of pages12
JournalIEEE Transactions on Affective Computing
Volume12
Issue number3
DOIs
StatePublished - 1 Jul 2021
Externally publishedYes

Keywords

  • Crossmodal learning
  • emotion embedding
  • emotion recognition
  • joint training
  • triplet loss

Fingerprint

Dive into the research topics of 'EmoBed: Strengthening Monomodal Emotion Recognition via Training with Crossmodal Emotion Embeddings'. Together they form a unique fingerprint.

Cite this