Comparing one and two-stage acoustic modeling in the recognition of emotion in speech

Björn Schuller, Bogdan Vlasenko, Ricardo Minguez, Gerhard Rigoll, Andreas Wendemuth

Publikation: KonferenzbeitragPapierBegutachtung

33 Zitate (Scopus)

Abstract

In the search for a standard unit for use in recognition of emotion in speech, a whole turn, that is the full section of speech by one person in a conversation, is common. Within applications such turns often seem favorable. Yet, high effectiveness of sub-turn entities is known. In this respect a two-stage approach is investigated to provide higher temporal resolution by chunking of speech-turns according to acoustic properties, and multi-instance learning for turn-mapping after individual chunk analysis. For chunking fast pre-segmentation into emotionally quasi-stationary segments by one-pass Viterbi beam search with token passing basing on MFCC is used. Chunk analysis is realized by brute-force large feature space construction with subsequent subset selection, SVM classification, and speaker normalization. Extensive tests reveal differences compared to one-stage processing. Alternatively, syllables are used for chunking.

OriginalspracheEnglisch
Seiten596-600
Seitenumfang5
DOIs
PublikationsstatusVeröffentlicht - 2007
Veranstaltung2007 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2007 - Kyoto, Japan
Dauer: 9 Dez. 200713 Dez. 2007

Konferenz

Konferenz2007 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2007
Land/GebietJapan
OrtKyoto
Zeitraum9/12/0713/12/07

Fingerprint

Untersuchen Sie die Forschungsthemen von „Comparing one and two-stage acoustic modeling in the recognition of emotion in speech“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren