Emotion recognition in the noise applying large acoustic feature sets

Björn Schuller, Dejan Arsić, Frank Wallhoff, Gerhard Rigoll

Research output: Contribution to journalConference articlepeer-review

87 Scopus citations

Abstract

Speech emotion recognition is considered mostly under ideal acoustic conditions: acted and elicited samples in studio quality are used besides sparse works on spontaneous fielddata. However, specific analysis of noise influence plays an important factor in speech processing and is practically not considered hereon, yet. We therefore discuss affect estimation under noise conditions herein. On 3 well-known public databases - DES, EMO-DB, and SUSAS - effects of postrecording noise addition in diverse dB levels, and performance under noise conditions during signal capturing, are shown. To cope with this new challenge we extend generation of functionals by extraction of a large 4k hi-level feature set out of more than 60 partially novel base contours. Such comprise among others intonation, intensity, formants, HNR, MFCC, and VOC19. Fast Information-Gain-Ratio filter-selection picks attributes according to noise conditions. Results are presented using Support Vector Machines as classifier.

Original languageEnglish
JournalProceedings of the International Conference on Speech Prosody
StatePublished - 2006
Event3rd International Conference on Speech Prosody, SP 2006 - Dresden, Germany
Duration: 2 May 20065 May 2006

Fingerprint

Dive into the research topics of 'Emotion recognition in the noise applying large acoustic feature sets'. Together they form a unique fingerprint.

Cite this