Detecting overlapping speech with long short-term memory recurrent neural networks

Jürgen T. Geiger, Florian Eyben, Björn Schuller, Gerhard Rigoll

Research output: Contribution to journalConference articlepeer-review

38 Scopus citations

Abstract

Detecting segments of overlapping speech (when two or more speakers are active at the same time) is a challenging problem. Previously, mostly HMM-based systems have been used for overlap detection, employing various different audio features. In this work, we propose a novel overlap detection system using Long Short-Term Memory (LSTM) recurrent neural networks. LSTMs are used to generate framewise overlap predictions which are applied for overlap detection. Furthermore, a tandem HMM-LSTM system is obtained by adding LSTM predictions to the HMM feature set. Experiments with the AMI corpus show that overlap detection performance of LSTMs is comparable to HMMs. The combination of HMMs and LSTMs improves overlap detection by achieving higher recall.

Original languageEnglish
Pages (from-to)1668-1672
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
StatePublished - 2013
Event14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013 - Lyon, France
Duration: 25 Aug 201329 Aug 2013

Keywords

  • Long short-term memory
  • Neural networks
  • Speaker diarization
  • Speech overlap detection

Fingerprint

Dive into the research topics of 'Detecting overlapping speech with long short-term memory recurrent neural networks'. Together they form a unique fingerprint.

Cite this