TY - GEN
T1 - Improving generalisation and robustness of acoustic affect recognition
AU - Eyben, Florian
AU - Schuller, Björn
AU - Rigoll, Gerhard
PY - 2012
Y1 - 2012
N2 - Emotion recognition in real-life conditions faces several challenging factors, which most studies on emotion recognition do not consider. Such factors include background noise, varying recording levels, and acoustic properties of the environment, for example. This paper presents a systematic evaluation of the influence of background noise of various types and SNRs, as well as recording level variations on the performance of automatic emotion recognition from speech. Both, natural and spontaneous as well as acted/prototypical emotions are considered. Besides the well known influence of additive noise, a significant influence of the recording level on the recognition performance is observed. Multi-condition learning with various noise types and recording levels is proposed as a way to increase robustness of methods based on standard acoustic feature sets and commonly used classifiers. It is compared to matched conditions learning and is found to be almost on par for many settings.
AB - Emotion recognition in real-life conditions faces several challenging factors, which most studies on emotion recognition do not consider. Such factors include background noise, varying recording levels, and acoustic properties of the environment, for example. This paper presents a systematic evaluation of the influence of background noise of various types and SNRs, as well as recording level variations on the performance of automatic emotion recognition from speech. Both, natural and spontaneous as well as acted/prototypical emotions are considered. Besides the well known influence of additive noise, a significant influence of the recording level on the recognition performance is observed. Multi-condition learning with various noise types and recording levels is proposed as a way to increase robustness of methods based on standard acoustic feature sets and commonly used classifiers. It is compared to matched conditions learning and is found to be almost on par for many settings.
KW - Emotion recognition
KW - Multicondition training
KW - Noise robustness
KW - Recording level
UR - http://www.scopus.com/inward/record.url?scp=84870199443&partnerID=8YFLogxK
U2 - 10.1145/2388676.2388785
DO - 10.1145/2388676.2388785
M3 - Conference contribution
AN - SCOPUS:84870199443
SN - 9781450314671
T3 - ICMI'12 - Proceedings of the ACM International Conference on Multimodal Interaction
SP - 517
EP - 521
BT - ICMI'12 - Proceedings of the ACM International Conference on Multimodal Interaction
T2 - 14th ACM International Conference on Multimodal Interaction, ICMI 2012
Y2 - 22 October 2012 through 26 October 2012
ER -