Abstract
In the search for a standard unit for use in recognition of emotion in speech, a whole turn, that is the full section of speech by one person in a conversation, is common. Within applications such turns often seem favorable. Yet, high effectiveness of sub-turn entities is known. In this respect a two-stage approach is investigated to provide higher temporal resolution by chunking of speech-turns according to acoustic properties, and multi-instance learning for turn-mapping after individual chunk analysis. For chunking fast pre-segmentation into emotionally quasi-stationary segments by one-pass Viterbi beam search with token passing basing on MFCC is used. Chunk analysis is realized by brute-force large feature space construction with subsequent subset selection, SVM classification, and speaker normalization. Extensive tests reveal differences compared to one-stage processing. Alternatively, syllables are used for chunking.
Original language | English |
---|---|
Pages | 596-600 |
Number of pages | 5 |
DOIs | |
State | Published - 2007 |
Event | 2007 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2007 - Kyoto, Japan Duration: 9 Dec 2007 → 13 Dec 2007 |
Conference
Conference | 2007 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2007 |
---|---|
Country/Territory | Japan |
City | Kyoto |
Period | 9/12/07 → 13/12/07 |
Keywords
- Affective computing
- Emotion recognition
- Segmentation