TY - JOUR
T1 - Cross-Corpus acoustic emotion recognition
T2 - Variances and strategies
AU - Schuller, Björn
AU - Vlasenko, Bogdan
AU - Eyben, Florian
AU - Wöllmer, Martin
AU - Stuhlsatz, André
AU - Wendemuth, Andreas
AU - Rigoll, Gerhard
PY - 2010/7
Y1 - 2010/7
N2 - As the recognition of emotion from speech has matured to a degree where it becomes applicable in real-life settings, it is time for a realistic view on obtainable performances. Most studies tend to overestimation in this respect: Acted data is often used rather than spontaneous data, results are reported on preselected prototypical data, and true speaker disjunctive partitioning is still less common than simple cross-validation. Even speaker disjunctive evaluation can give only a little insight into the generalization ability of today's emotion recognition engines since training and test data used for system development usually tend to be similar as far as recording conditions, noise overlay, language, and types of emotions are concerned. A considerably more realistic impression can be gathered by interset evaluation: We therefore show results employing six standard databases in a cross-corpora evaluation experiment which could also be helpful for learning about chances to add resources for training and overcoming the typical sparseness in the field. To better cope with the observed high variances, different types of normalization are investigated. 1.8 k individual evaluations in total indicate the crucial performance inferiority of inter to intracorpus testing.
AB - As the recognition of emotion from speech has matured to a degree where it becomes applicable in real-life settings, it is time for a realistic view on obtainable performances. Most studies tend to overestimation in this respect: Acted data is often used rather than spontaneous data, results are reported on preselected prototypical data, and true speaker disjunctive partitioning is still less common than simple cross-validation. Even speaker disjunctive evaluation can give only a little insight into the generalization ability of today's emotion recognition engines since training and test data used for system development usually tend to be similar as far as recording conditions, noise overlay, language, and types of emotions are concerned. A considerably more realistic impression can be gathered by interset evaluation: We therefore show results employing six standard databases in a cross-corpora evaluation experiment which could also be helpful for learning about chances to add resources for training and overcoming the typical sparseness in the field. To better cope with the observed high variances, different types of normalization are investigated. 1.8 k individual evaluations in total indicate the crucial performance inferiority of inter to intracorpus testing.
KW - Affective computing
KW - cross-corpus evaluation
KW - normalization
KW - speech emotion recognition
UR - http://www.scopus.com/inward/record.url?scp=80053925819&partnerID=8YFLogxK
U2 - 10.1109/T-AFFC.2010.8
DO - 10.1109/T-AFFC.2010.8
M3 - Article
AN - SCOPUS:80053925819
SN - 1949-3045
VL - 1
SP - 119
EP - 131
JO - IEEE Transactions on Affective Computing
JF - IEEE Transactions on Affective Computing
IS - 2
M1 - 5557843
ER -