Emotion recognition in live broadcasting: a multimodal deep learning framework

Rizwan Abbas, Björn W. Schuller, Xuewei Li, Chi Lin, Xi Li

Research output: Contribution to journalArticlepeer-review

Abstract

Multimodal emotion recognition is a rapidly developing field with applications across diverse fields such as entertainment, healthcare, marketing, and education. The emergence of live broadcasting demands real-time emotion recognition, which involves analyzing emotions via body language, voice, facial expressions, and context. Previous studies have faced challenges associated with multimodal emotion recognition in live broadcasting, such as computational efficiency, noisy and incomplete data, and difficult camera angles. This research presents a Multimodal Emotion Recognition in Live Broadcasting (MERLB) system that collects speech, facial expressions, and context displayed in live broadcasting for emotion recognition. We utilize a deep convolutional neural network architecture for facial emotion recognition, incorporating inception modules and dense blocks. We aim to enhance computational efficiency by focusing on key segments rather than analyzing the entire utterance. MERLB employs tensor train layers to combine multimodal representations at higher orders. Experiments were conducted on the FIFA, League of Legends, IEMOCAP, and CMU-MOSEI datasets. MERLB achieves a 6.44% F1 score improvement on the FIFA dataset and 4.71% on League of Legends, outperforming other multi-modal emotion methods on IEMOCAP and CMU-MOSEI datasets. Our code is available at https://github.com/swerizwan/merlb.

Original languageEnglish
Article number253
JournalMultimedia Systems
Volume31
Issue number3
DOIs
StatePublished - Jun 2025
Externally publishedYes

Keywords

  • Facial expressions
  • Multimodal emotion recognition
  • Speech emotion
  • Tensor train layers

Fingerprint

Dive into the research topics of 'Emotion recognition in live broadcasting: a multimodal deep learning framework'. Together they form a unique fingerprint.

Cite this