Localization of non-linguistic events in spontaneous speech by non-negative matrix factorization and long short-term memory

Felix Weninger, Björn Schuller, Martin Wöllmer, Gerhard Rigoll

Publikation: Beitrag in Buch/Bericht/KonferenzbandKonferenzbeitragBegutachtung

22 Zitate (Scopus)

Abstract

Features generated by Non-Negative Matrix Factorization (NMF) have successfully been introduced into robust speech processing, including noise-robust speech recognition and detection of non-linguistic vocalizations. In this study, we introduce a novel tandem approach by integrating likelihood features derived from NMF into Bidirectional Long Short-Term Memory Recurrent Neural Networks (BLSTM-RNNs) in order to dynamically localize non-linguistic events, i. e., laughter, vocal, and non-vocal noise, in highly spontaneous speech. We compare our tandem architecture to a baseline conventional phoneme-HMM-based speech recognizer, and achieve a relative reduction of the frame error rate by 37.5% in the discrimination of speech and different non-speech segments.

OriginalspracheEnglisch
Titel2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
Seiten5840-5843
Seitenumfang4
DOIs
PublikationsstatusVeröffentlicht - 2011
Veranstaltung36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Prague, Tschechische Republik
Dauer: 22 Mai 201127 Mai 2011

Publikationsreihe

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Konferenz

Konferenz36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
Land/GebietTschechische Republik
OrtPrague
Zeitraum22/05/1127/05/11

Fingerprint

Untersuchen Sie die Forschungsthemen von „Localization of non-linguistic events in spontaneous speech by non-negative matrix factorization and long short-term memory“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren