Enhancing LSTM rnn-based speech overlap detection by artificially mixed data

Gerhard Hagerer, Vedhas Pandit, Florian Eyben, Bjorn Schuller

Research output: Contribution to conferencePaperpeer-review

14 Scopus citations

Abstract

This paper presents a new method for Long Short-Term Memory Recurrent Neural Network (LSTM) based speech overlap detection. To this end, speech overlap data is created artificially by mixing large amounts of speech utterances. Our elaborate training strategies and presented network structures demonstrate performance surpassing the considered state-of-the-art overlap detectors. Thereby we target the full ternary task of non-speech, speech, and overlap detection. Furthermore, speakers' gender is recognised, as the first successful combination of this kind within one model.

Original languageEnglish
Pages45-52
Number of pages8
StatePublished - 2017
Externally publishedYes
Event3rd AES International Conference on Semantic Audio 2017 - Erlangen, Germany
Duration: 22 Jun 201724 Jun 2017

Conference

Conference3rd AES International Conference on Semantic Audio 2017
Country/TerritoryGermany
CityErlangen
Period22/06/1724/06/17

Fingerprint

Dive into the research topics of 'Enhancing LSTM rnn-based speech overlap detection by artificially mixed data'. Together they form a unique fingerprint.

Cite this