Speech overlap detection and attribution using convolutive non-negative sparse coding

Ravichander Vipperla, Jürgen T. Geiger, Simon Bozonnet, Dong Wang, Nicholas Evans, Björn Schuller, Gerhard Rigoll

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

25 Scopus citations

Abstract

Overlapping speech is known to degrade speaker diarization performance with impacts on speaker clustering and segmentation. While previous work made important advances in detecting overlapping speech intervals and in attributing them to relevant speakers, the problem remains largely unsolved. This paper reports the first application of convolutive non-negative sparse coding (CNSC) to the overlap problem. CNSC aims to decompose a composite signal into its underlying contributory parts and is thus naturally suited to overlap detection and attribution. Experimental results on NIST RT data show that the CNSC approach gives comparable results to a state-of-the-art hidden Markov model based overlap detector. In a practical diarization system, CNSC based speaker attribution is shown to reduce the speaker error by over 40% relative in overlapping segments.

Original languageEnglish
Title of host publication2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings
Pages4181-4184
Number of pages4
DOIs
StatePublished - 2012
Event2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Kyoto, Japan
Duration: 25 Mar 201230 Mar 2012

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012
Country/TerritoryJapan
CityKyoto
Period25/03/1230/03/12

Keywords

  • convolutive non-negative sparse coding
  • overlap detection
  • speaker attribution
  • speaker diarization

Fingerprint

Dive into the research topics of 'Speech overlap detection and attribution using convolutive non-negative sparse coding'. Together they form a unique fingerprint.

Cite this