TY - GEN
T1 - Meta-classifiers in acoustic and linguistic feature fusion-based affect recognition
AU - Schuller, Björn
AU - Villar, Raquel Jiménez
AU - Rigoll, Gerhard
AU - Lang, Manfred
PY - 2005
Y1 - 2005
N2 - Within this work we suggest a novel approach to affect recognition based on acoustic and linguistic analysis of spoken utterances. In order to achieve maximum discrimination power within robust integration of these information sources a fusion on the feature level is introduced. Considering classification we use meta-classifiers as StackingC and Boosting for a stabilized performance and combination of classifiers within ensembles. Extensive comparison of diverse base-classifiers comprising among others Support Vector Machines, Neural Networks, stochastic models, and Decision Trees will be fulfilled. 381 acoustic features are extracted and their relevance is calculated by a Sequential Forward Floating Search in comparison to reduction by a Principal Component Analysis. Several variants for linguistic feature calculation are described and ranked including bunch-of-words, n-grams, salience, and mutual information. Furthermore reduction by stopping and stemming or filter-based selection methods is evaluated reducing 2,334 linguistic features. Seven discrete emotions described in the MPEG-4 standard are recognized within an existing recognition engine. The presented results base on two large databases of 4,336 acted and real emotion samples from movies, chat and car interaction dialogues. A significant gain and an outstanding overall performance are observed by this novel fusion and use of ensembles.
AB - Within this work we suggest a novel approach to affect recognition based on acoustic and linguistic analysis of spoken utterances. In order to achieve maximum discrimination power within robust integration of these information sources a fusion on the feature level is introduced. Considering classification we use meta-classifiers as StackingC and Boosting for a stabilized performance and combination of classifiers within ensembles. Extensive comparison of diverse base-classifiers comprising among others Support Vector Machines, Neural Networks, stochastic models, and Decision Trees will be fulfilled. 381 acoustic features are extracted and their relevance is calculated by a Sequential Forward Floating Search in comparison to reduction by a Principal Component Analysis. Several variants for linguistic feature calculation are described and ranked including bunch-of-words, n-grams, salience, and mutual information. Furthermore reduction by stopping and stemming or filter-based selection methods is evaluated reducing 2,334 linguistic features. Seven discrete emotions described in the MPEG-4 standard are recognized within an existing recognition engine. The presented results base on two large databases of 4,336 acted and real emotion samples from movies, chat and car interaction dialogues. A significant gain and an outstanding overall performance are observed by this novel fusion and use of ensembles.
UR - http://www.scopus.com/inward/record.url?scp=33646758175&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2005.1415116
DO - 10.1109/ICASSP.2005.1415116
M3 - Conference contribution
AN - SCOPUS:33646758175
SN - 0780388747
SN - 9780780388741
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 325
EP - 328
BT - 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05
Y2 - 18 March 2005 through 23 March 2005
ER -