Autonomous emotion learning in speech: A view of zero-shot speech emotion recognition

Xinzhou Xu, Jun Deng, Nicholas Cummins, Zixing Zhang, Li Zhao, Björn Schuller

Research output: Contribution to journalConference articlepeer-review

15 Scopus citations

Abstract

Conventionally, speech emotion recognition is achieved using passive learning approaches. Differing from such approaches, we herein propose and develop a dynamic method of autonomous emotion learning based on zero-shot learning. The proposed methodology employs emotional dimensions as the attributes in the zero-shot learning paradigm, resulting in two phases of learning, namely attribute learning and label learning. Attribute learning connects the paralinguistic features and attributes utilising speech with known emotional labels, while label learning aims at defining unseen emotions through the attributes. The experimental results achieved on the CINEMO corpus indicate that zero-shot learning is a useful technique for autonomous speech-based emotion learning, achieving accuracies considerably better than chance level and an attribute-based gold-standard setup. Furthermore, different emotion recognition tasks, emotional attributes, and employed approaches strongly influence system performance.

Original languageEnglish
Pages (from-to)949-953
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2019-September
DOIs
StatePublished - 2019
Externally publishedYes
Event20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 - Graz, Austria
Duration: 15 Sep 201919 Sep 2019

Keywords

  • Autonomous emotion learning
  • Emotional attributes
  • Speech emotion recognition
  • Zero-shot learning

Fingerprint

Dive into the research topics of 'Autonomous emotion learning in speech: A view of zero-shot speech emotion recognition'. Together they form a unique fingerprint.

Cite this