TY - GEN
T1 - Fine-tuning HMMS for nonverbal vocalizations in spontaneous speech
T2 - 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012
AU - Prylipko, Dmytro
AU - Schuller, Björn
AU - Wendemuth, Andreas
PY - 2012
Y1 - 2012
N2 - Phenomena like filled pauses, laughter, breathing, hesitation, etc. play significant role in everyday human-to-human conversation and have a significant influence on speech recognition accuracy [1]. Because of their nature (e. g. long duration), they should be modeled with different number of emitting states and Gaussian mixtures. In this paper we address this question and try to determine the most suitable method for finding these parameters: we provide an examination of two methods for optimization of hidden Markov model (HMM) configurations for better classification and recognition of nonverbal vocalizations within speech. Experiments were conducted on three conversational databases: TUM AVIC, Verbmobil, and SmartKom. These experiments show that with HMMs configurations tailored to a particular database we can achieve 1-3% improvement in speech recognition accuracy with comparison to a baseline topology. An in-depth analysis of discussed methods is provided.
AB - Phenomena like filled pauses, laughter, breathing, hesitation, etc. play significant role in everyday human-to-human conversation and have a significant influence on speech recognition accuracy [1]. Because of their nature (e. g. long duration), they should be modeled with different number of emitting states and Gaussian mixtures. In this paper we address this question and try to determine the most suitable method for finding these parameters: we provide an examination of two methods for optimization of hidden Markov model (HMM) configurations for better classification and recognition of nonverbal vocalizations within speech. Experiments were conducted on three conversational databases: TUM AVIC, Verbmobil, and SmartKom. These experiments show that with HMMs configurations tailored to a particular database we can achieve 1-3% improvement in speech recognition accuracy with comparison to a baseline topology. An in-depth analysis of discussed methods is provided.
KW - Spontaneous speech
KW - laughter recognition
KW - multiple corpora
KW - nonverbals
UR - http://www.scopus.com/inward/record.url?scp=84867614629&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2012.6288949
DO - 10.1109/ICASSP.2012.6288949
M3 - Conference contribution
AN - SCOPUS:84867614629
SN - 9781467300469
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 4625
EP - 4628
BT - 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings
Y2 - 25 March 2012 through 30 March 2012
ER -