Abstract
In the search for a standard unit for use in recognition of emotion in speech, a whole turn, that is the full section of speech by one person in a conversation, is common. Within applications such turns often seem favorable. Yet, high effectiveness of sub-turn entities is known. In this respect a two-stage approach is investigated to provide higher temporal resolution by chunking of speech-turns according to acoustic properties, and multi-instance learning for turn-mapping after individual chunk analysis. For chunking fast pre-segmentation into emotionally quasi-stationary segments by one-pass Viterbi beam search with token passing basing on MFCC is used. Chunk analysis is realized by brute-force large feature space construction with subsequent subset selection, SVM classification, and speaker normalization. Extensive tests reveal differences compared to one-stage processing. Alternatively, syllables are used for chunking.
| Original language | English |
|---|---|
| Pages | 596-600 |
| Number of pages | 5 |
| DOIs | |
| State | Published - 2007 |
| Event | 2007 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2007 - Kyoto, Japan Duration: 9 Dec 2007 → 13 Dec 2007 |
Conference
| Conference | 2007 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2007 |
|---|---|
| Country/Territory | Japan |
| City | Kyoto |
| Period | 9/12/07 → 13/12/07 |
Keywords
- Affective computing
- Emotion recognition
- Segmentation
Fingerprint
Dive into the research topics of 'Comparing one and two-stage acoustic modeling in the recognition of emotion in speech'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver