TY - GEN
T1 - Hidden Markov model-based speech emotion recognition
AU - Schuller, Björn
AU - Rigoll, Gerhard
AU - Lang, Manfred
N1 - Publisher Copyright:
© 2003 IEEE.
PY - 2003
Y1 - 2003
N2 - In this contribution we introduce speech emotion recognition by use of continuous hidden Markov models. Two methods are propagated and compared throughout the paper. Within the first method a global statistics framework of an utterance is classified by Gaussian mixture models using derived features of the raw pitch and energy contour of the speech signal. A second method introduces increased temporal complexity applying continuous hidden Markov models considering several states using low-level instantaneous features instead of global statistics. The paper addresses the design of working recognition engines and results achieved with respect to the alluded alternatives. A speech corpus consisting of acted and spontaneous emotion samples in German and English language is described in detail. Both engines have been tested and trained using this equivalent speech corpus. Results in recognition of seven discrete emotions exceeded 86% recognition rate. As a basis of comparison the similar judgment of human deciders classifying the same corpus at 79.8% recognition rate was analyzed.
AB - In this contribution we introduce speech emotion recognition by use of continuous hidden Markov models. Two methods are propagated and compared throughout the paper. Within the first method a global statistics framework of an utterance is classified by Gaussian mixture models using derived features of the raw pitch and energy contour of the speech signal. A second method introduces increased temporal complexity applying continuous hidden Markov models considering several states using low-level instantaneous features instead of global statistics. The paper addresses the design of working recognition engines and results achieved with respect to the alluded alternatives. A speech corpus consisting of acted and spontaneous emotion samples in German and English language is described in detail. Both engines have been tested and trained using this equivalent speech corpus. Results in recognition of seven discrete emotions exceeded 86% recognition rate. As a basis of comparison the similar judgment of human deciders classifying the same corpus at 79.8% recognition rate was analyzed.
UR - http://www.scopus.com/inward/record.url?scp=84908477401&partnerID=8YFLogxK
U2 - 10.1109/ICME.2003.1220939
DO - 10.1109/ICME.2003.1220939
M3 - Conference contribution
AN - SCOPUS:84908477401
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
SP - I401-I404
BT - Proceedings - 2003 International Conference on Multimedia and Expo, ICME
PB - IEEE Computer Society
T2 - 2003 International Conference on Multimedia and Expo, ICME 2003
Y2 - 6 July 2003 through 9 July 2003
ER -