Affect-robust speech recognition by dynamic emotional adaptation

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

27 Scopus citations

Abstract

Automatic Speech Recognition fails to a certain extent when confronted with highly affective speech. In order to cope with this problem we suggest dynamic adaptation to the actual user emotion. The ASR framework is built by a hybrid ANN/HMM mono-phone 5k bi-gram LM recognizer. Based hereon we show adaptation to the affective speaking style. Speech emotion recognition takes place prior to the actual recognition task to choose appropriate models. We therefore focus on fast emotion recognition based on low extra feature extraction effort. As databases for proof-of-concept we use a single digit task and sentences from the well-known WSJ-corpus. These have been re-recorded in acted neutral and angrily speaking style under ideal acoustic conditions to exclude other influences. Effectiveness of acoustic emotion recognition is also proved on the SUSAS corpus. We finally evaluate the need of adaptation and demonstrate significant superiority of our dynamic approach to static adaptation.

Original languageEnglish
Title of host publication3rd International Conference on Speech Prosody 2006
EditorsR. Hoffmann, H. Mixdorff
PublisherInternational Speech Communications Association
ISBN (Electronic)9780000000002
StatePublished - 2006
Event3rd International Conference on Speech Prosody, SP 2006 - Dresden, Germany
Duration: 2 May 20065 May 2006

Publication series

NameProceedings of the International Conference on Speech Prosody
ISSN (Print)2333-2042

Conference

Conference3rd International Conference on Speech Prosody, SP 2006
Country/TerritoryGermany
CityDresden
Period2/05/065/05/06

Fingerprint

Dive into the research topics of 'Affect-robust speech recognition by dynamic emotional adaptation'. Together they form a unique fingerprint.

Cite this