Comparing one and two-stage acoustic modeling in the recognition of emotion in speech

Björn Schuller, Bogdan Vlasenko, Ricardo Minguez, Gerhard Rigoll, Andreas Wendemuth

Research output: Contribution to conferencePaperpeer-review

33 Scopus citations

Abstract

In the search for a standard unit for use in recognition of emotion in speech, a whole turn, that is the full section of speech by one person in a conversation, is common. Within applications such turns often seem favorable. Yet, high effectiveness of sub-turn entities is known. In this respect a two-stage approach is investigated to provide higher temporal resolution by chunking of speech-turns according to acoustic properties, and multi-instance learning for turn-mapping after individual chunk analysis. For chunking fast pre-segmentation into emotionally quasi-stationary segments by one-pass Viterbi beam search with token passing basing on MFCC is used. Chunk analysis is realized by brute-force large feature space construction with subsequent subset selection, SVM classification, and speaker normalization. Extensive tests reveal differences compared to one-stage processing. Alternatively, syllables are used for chunking.

Original languageEnglish
Pages596-600
Number of pages5
DOIs
StatePublished - 2007
Event2007 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2007 - Kyoto, Japan
Duration: 9 Dec 200713 Dec 2007

Conference

Conference2007 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2007
Country/TerritoryJapan
CityKyoto
Period9/12/0713/12/07

Keywords

  • Affective computing
  • Emotion recognition
  • Segmentation

Fingerprint

Dive into the research topics of 'Comparing one and two-stage acoustic modeling in the recognition of emotion in speech'. Together they form a unique fingerprint.

Cite this