TY - GEN
T1 - Localization of non-linguistic events in spontaneous speech by non-negative matrix factorization and long short-term memory
AU - Weninger, Felix
AU - Schuller, Björn
AU - Wöllmer, Martin
AU - Rigoll, Gerhard
PY - 2011
Y1 - 2011
N2 - Features generated by Non-Negative Matrix Factorization (NMF) have successfully been introduced into robust speech processing, including noise-robust speech recognition and detection of non-linguistic vocalizations. In this study, we introduce a novel tandem approach by integrating likelihood features derived from NMF into Bidirectional Long Short-Term Memory Recurrent Neural Networks (BLSTM-RNNs) in order to dynamically localize non-linguistic events, i. e., laughter, vocal, and non-vocal noise, in highly spontaneous speech. We compare our tandem architecture to a baseline conventional phoneme-HMM-based speech recognizer, and achieve a relative reduction of the frame error rate by 37.5% in the discrimination of speech and different non-speech segments.
AB - Features generated by Non-Negative Matrix Factorization (NMF) have successfully been introduced into robust speech processing, including noise-robust speech recognition and detection of non-linguistic vocalizations. In this study, we introduce a novel tandem approach by integrating likelihood features derived from NMF into Bidirectional Long Short-Term Memory Recurrent Neural Networks (BLSTM-RNNs) in order to dynamically localize non-linguistic events, i. e., laughter, vocal, and non-vocal noise, in highly spontaneous speech. We compare our tandem architecture to a baseline conventional phoneme-HMM-based speech recognizer, and achieve a relative reduction of the frame error rate by 37.5% in the discrimination of speech and different non-speech segments.
KW - Non-Linguistic Vocalizations
KW - Non-Negative Matrix Factorization
KW - Recurrent Neural Networks
UR - http://www.scopus.com/inward/record.url?scp=80051621128&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2011.5947689
DO - 10.1109/ICASSP.2011.5947689
M3 - Conference contribution
AN - SCOPUS:80051621128
SN - 9781457705397
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 5840
EP - 5843
BT - 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
T2 - 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
Y2 - 22 May 2011 through 27 May 2011
ER -