TY - JOUR
T1 - Contrastive Learning Based Modality-Invariant Feature Acquisition for Robust Multimodal Emotion Recognition With Missing Modalities
AU - Liu, Rui
AU - Zuo, Haolin
AU - Lian, Zheng
AU - Schuller, Bjorn W.
AU - Li, Haizhou
N1 - Publisher Copyright:
© 2010-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Multimodal emotion recognition (MER) aims to understand the way that humans express their emotions by exploring complementary information across modalities. However, it is hard to guarantee that full-modality data is always available in real-world scenarios. To deal with missing modalities, researchers focused on meaningful joint multimodal representation learning during cross-modal missing modality imagination. However, the cross-modal imagination mechanism is highly susceptible to errors due to the "modality gap"issue, which affects the imagination accuracy, thus, the final recognition performance. To this end, we introduce the concept of a modality-invariant feature into the missing modality imagination network, which contains two key modules: 1) a novel contrastive learning-based module to extract modality-invariant features under full modalities and 2) a robust imagination module based on imagined invariant features to reconstruct missing information under missing conditions. Finally, we incorporate imagined and available modalities for emotion recognition. Experimental results on benchmark datasets demonstrate that our proposed method outperforms existing state-of-the-art strategies. Compared with our previous work, our extended version is more effective on multimodal emotion recognition with missing modalities.
AB - Multimodal emotion recognition (MER) aims to understand the way that humans express their emotions by exploring complementary information across modalities. However, it is hard to guarantee that full-modality data is always available in real-world scenarios. To deal with missing modalities, researchers focused on meaningful joint multimodal representation learning during cross-modal missing modality imagination. However, the cross-modal imagination mechanism is highly susceptible to errors due to the "modality gap"issue, which affects the imagination accuracy, thus, the final recognition performance. To this end, we introduce the concept of a modality-invariant feature into the missing modality imagination network, which contains two key modules: 1) a novel contrastive learning-based module to extract modality-invariant features under full modalities and 2) a robust imagination module based on imagined invariant features to reconstruct missing information under missing conditions. Finally, we incorporate imagined and available modalities for emotion recognition. Experimental results on benchmark datasets demonstrate that our proposed method outperforms existing state-of-the-art strategies. Compared with our previous work, our extended version is more effective on multimodal emotion recognition with missing modalities.
KW - Contrastive learning
KW - invariant feature
KW - missing modality imagination
KW - modality gap
KW - multimodal emotion recognition
UR - http://www.scopus.com/inward/record.url?scp=85188432592&partnerID=8YFLogxK
U2 - 10.1109/TAFFC.2024.3378570
DO - 10.1109/TAFFC.2024.3378570
M3 - Article
AN - SCOPUS:85188432592
SN - 1949-3045
VL - 15
SP - 1856
EP - 1873
JO - IEEE Transactions on Affective Computing
JF - IEEE Transactions on Affective Computing
IS - 4
ER -