A survey of multimodal sentiment analysis

Mohammad Soleymani, David Garcia, Brendan Jou, Björn Schuller, Shih Fu Chang, Maja Pantic

Research output: Contribution to journalArticlepeer-review

379 Scopus citations


Sentiment analysis aims to automatically uncover the underlying attitude that we hold towards an entity. The aggregation of these sentiment over a population represents opinion polling and has numerous applications. Current text-based sentiment analysis rely on the construction of dictionaries and machine learning models that learn sentiment from large text corpora. Sentiment analysis from text is currently widely used for customer satisfaction assessment and brand perception analysis, among others. With the proliferation of social media, multimodal sentiment analysis is set to bring new opportunities with the arrival of complementary data streams for improving and going beyond text-based sentiment analysis. Since sentiment can be detected through affective traces it leaves, such as facial and vocal displays, multimodal sentiment analysis offers promising avenues for analyzing facial and vocal expressions in addition to the transcript or textual content. These approaches leverage emotion recognition and context inference to determine the underlying polarity and scope of an individual's sentiment. In this survey, we define sentiment and the problem of multimodal sentiment analysis and review recent developments in multimodal sentiment analysis in different domains, including spoken reviews, images, video blogs, human–machine and human–human interactions. Challenges and opportunities of this emerging field are also discussed leading to our thesis that multimodal sentiment analysis holds a significant untapped potential.

Original languageEnglish
Pages (from-to)3-14
Number of pages12
JournalImage and Vision Computing
StatePublished - Sep 2017
Externally publishedYes


  • Affect
  • Affective computing
  • Computer vision
  • Human behavior analysis
  • Sentiment
  • Sentiment analysis


Dive into the research topics of 'A survey of multimodal sentiment analysis'. Together they form a unique fingerprint.

Cite this