Affect-robust speech recognition by dynamic emotional adaptation

Publikation: Beitrag in Buch/Bericht/KonferenzbandKonferenzbeitragBegutachtung

27 Zitate (Scopus)

Abstract

Automatic Speech Recognition fails to a certain extent when confronted with highly affective speech. In order to cope with this problem we suggest dynamic adaptation to the actual user emotion. The ASR framework is built by a hybrid ANN/HMM mono-phone 5k bi-gram LM recognizer. Based hereon we show adaptation to the affective speaking style. Speech emotion recognition takes place prior to the actual recognition task to choose appropriate models. We therefore focus on fast emotion recognition based on low extra feature extraction effort. As databases for proof-of-concept we use a single digit task and sentences from the well-known WSJ-corpus. These have been re-recorded in acted neutral and angrily speaking style under ideal acoustic conditions to exclude other influences. Effectiveness of acoustic emotion recognition is also proved on the SUSAS corpus. We finally evaluate the need of adaptation and demonstrate significant superiority of our dynamic approach to static adaptation.

OriginalspracheEnglisch
Titel3rd International Conference on Speech Prosody 2006
Redakteure/-innenR. Hoffmann, H. Mixdorff
Herausgeber (Verlag)International Speech Communications Association
ISBN (elektronisch)9780000000002
PublikationsstatusVeröffentlicht - 2006
Veranstaltung3rd International Conference on Speech Prosody, SP 2006 - Dresden, Deutschland
Dauer: 2 Mai 20065 Mai 2006

Publikationsreihe

NameProceedings of the International Conference on Speech Prosody
ISSN (Print)2333-2042

Konferenz

Konferenz3rd International Conference on Speech Prosody, SP 2006
Land/GebietDeutschland
OrtDresden
Zeitraum2/05/065/05/06

Fingerprint

Untersuchen Sie die Forschungsthemen von „Affect-robust speech recognition by dynamic emotional adaptation“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren