TY - JOUR
T1 - Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine - Belief network architecture
AU - Schuller, Björn
AU - Rigol, Gerhard
AU - Lang, Manfred
PY - 2004
Y1 - 2004
N2 - In this contribution we introduce a novel approach to the combination of acoustic features and language information for a most robust automatic recognition of a speaker's emotion. Seven discrete emotional states are classified throughout the work. Firstly a model for the recognition of emotion by acoustic features is presented. The derived features of the signal-, pitch-, energy, and spectral contours are ranked by their quantitative contribution to the estimation of an emotion. Several different classification methods including linear classifiers, Gaussian Mixture Models, Neural Nets, and Support Vector Machines are compared by their performance within this task. Secondly an approach to emotion recognition by the spoken content is introduced applying Belief Network based spotting for emotional key-phrases. Finally the two information sources will be integrated in a soft decision fusion by using a Neural Net. The gain will be evaluated and compared to other advances. Two emotional speech corpora used for training and evaluation are described in detail and the results achieved applying the propagated novel advance to speaker emotion recognition are presented and discussed.
AB - In this contribution we introduce a novel approach to the combination of acoustic features and language information for a most robust automatic recognition of a speaker's emotion. Seven discrete emotional states are classified throughout the work. Firstly a model for the recognition of emotion by acoustic features is presented. The derived features of the signal-, pitch-, energy, and spectral contours are ranked by their quantitative contribution to the estimation of an emotion. Several different classification methods including linear classifiers, Gaussian Mixture Models, Neural Nets, and Support Vector Machines are compared by their performance within this task. Secondly an approach to emotion recognition by the spoken content is introduced applying Belief Network based spotting for emotional key-phrases. Finally the two information sources will be integrated in a soft decision fusion by using a Neural Net. The gain will be evaluated and compared to other advances. Two emotional speech corpora used for training and evaluation are described in detail and the results achieved applying the propagated novel advance to speaker emotion recognition are presented and discussed.
UR - http://www.scopus.com/inward/record.url?scp=4544316885&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:4544316885
SN - 1520-6149
VL - 1
SP - I577-I580
JO - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
JF - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
T2 - Proceedings - IEEE International Conference on Acoustics, Speech, and Signal Processing
Y2 - 17 May 2004 through 21 May 2004
ER -