Combining frame and turn-level information for robust recognition ofemotions within speech

Bogdan Vlasenko, Bjödrn Schuller, Andreas Wendemuth, Gerhard Rigoll

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

27 Scopus citations

Abstract

Current approaches to the recognition of emotion within speech usually use statistic feature information obtained by application of functionals on turn- or chunk levels. Yet, it is well known that thereby important information on temporal sub-layers as the frame-level is lost. We therefore investigate the benefits of integration of such information within turn-level feature space. For frame-level analysis we use GMM for classification and 39 MFCC and energy features with CMS. In a subsequent step output scores are fed forward into a 1.4k large-feature-space turn-level SVM emotion recognition engine. Thereby we use a variety of Low-Level-Descriptors and functionals to cover prosodic, speech quality, and articulatory aspects. Extensive test-runs are carried out on the public databases EMO-DB and SUSAS. Speaker-independent analysis is faced by speaker normalization. Overall results highly emphasize the benefits of feature integration on diverse time scales.

Original languageEnglish
Title of host publicationInternational Speech Communication Association - 8th Annual Conference of the International Speech Communication Association, Interspeech 2007
Pages2712-2715
Number of pages4
StatePublished - 2007
Event8th Annual Conference of the International Speech Communication Association, Interspeech 2007 - Antwerp, Belgium
Duration: 27 Aug 200731 Aug 2007

Publication series

NameInternational Speech Communication Association - 8th Annual Conference of the International Speech Communication Association, Interspeech 2007
Volume4

Conference

Conference8th Annual Conference of the International Speech Communication Association, Interspeech 2007
Country/TerritoryBelgium
CityAntwerp
Period27/08/0731/08/07

Keywords

  • Emotion recognition
  • Frame-level analysis
  • Model fusion
  • Turn-level analysis

Fingerprint

Dive into the research topics of 'Combining frame and turn-level information for robust recognition ofemotions within speech'. Together they form a unique fingerprint.

Cite this