Abstract
Automatic emotion recognition and computational paralinguistics have matured to some robustness under controlled laboratory settings, however, the accuracies are degraded in real-life conditions such as the presence of noise and reverberation. In this paper we take a look at the relevance of acoustic features for expression of valence, arousal, and interest conveyed by a speaker's voice. Experiments are conducted on the GEMEP and TUM AVIC databases. To simulate realistically degraded conditions the audio is corrupted with real room impulse responses and real-life noise recordings. Features well correlated with the target (emotion) over a wide range of acoustic conditions are analysed and an interpretation is given. Classification results in matched and mismatched settings with multi-condition training are provided to validate the benefit of the feature selection method. Our proposed way of selecting features over a range of noise types considerably boosts the generalisation ability of the classifiers.
Original language | English |
---|---|
Pages (from-to) | 2044-2048 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
State | Published - 2013 |
Externally published | Yes |
Event | 14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013 - Lyon, France Duration: 25 Aug 2013 → 29 Aug 2013 |
Keywords
- Acoustic features
- Affect
- Emotion
- Noise robustness
- Paralinguistics