Frame vs. turn-level: Emotion recognition from speech considering static and dynamic processing

Bogdan Vlasenko, Björn Schuller, Andreas Wendemuth, Gerhard Rigoll

Publikation: Beitrag in Buch/Bericht/KonferenzbandKonferenzbeitragBegutachtung

70 Zitate (Scopus)

Abstract

Opposing the pre-dominant turn-wise statistics of acoustic Low-Level-Descriptors followed by static classification we re-investigate dynamic modeling directly on the frame-level in speech-based emotion recognition. This seems beneficial, as it is well known that important information on temporal sub-turn-layers exists. And, most promisingly, we integrate this frame-level information within a state-of-the-art large-feature-space emotion recognition engine. In order to investigate frame-level processing we employ a typical speaker-recognition setup tailored for the use of emotion classification. That is a GMM for classification and MFCC plus speed and acceleration coefficients as features. We thereby also consider use of multiple states, respectively an HMM. In order to fuse this information with turn-based modeling, output scores are added to a super-vector combined with static acoustic features. Thereby a variety of Low-Level-Descriptors and functionals to cover prosodic, speech quality, and articulatory aspects are considered. Starting from 1.4k features we select optimal configurations including and excluding GMM information. The final decision task is realized by use of SVM. Extensive test-runs are carried out on two popular public databases, namely EMO-DB and SUSAS, to investigate acted and spontaneous data. As we face the current challenge of speaker-independent analysis we also discuss benefits arising from speaker normalization. The results obtained clearly emphasize the superior power of integrated diverse time-levels.

OriginalspracheEnglisch
TitelAffective Computing and Intelligent Interaction - 2nd International Conference, ACII 2007, Proceedings
Herausgeber (Verlag)Springer Verlag
Seiten139-147
Seitenumfang9
ISBN (Print)9783540748885
DOIs
PublikationsstatusVeröffentlicht - 2007
Veranstaltung2nd International Conference on Affective Computing and Intelligent Interaction, ACII 2007 - Lisbon, Portugal
Dauer: 12 Sept. 200714 Sept. 2007

Publikationsreihe

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band4738 LNCS
ISSN (Print)0302-9743
ISSN (elektronisch)1611-3349

Konferenz

Konferenz2nd International Conference on Affective Computing and Intelligent Interaction, ACII 2007
Land/GebietPortugal
OrtLisbon
Zeitraum12/09/0714/09/07

Fingerprint

Untersuchen Sie die Forschungsthemen von „Frame vs. turn-level: Emotion recognition from speech considering static and dynamic processing“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren