A paralinguistic approach to speaker diarisation using age, gender, voice likability and personality traits

Yue Zhang, Felix Weninger, Boqing Liu, Maximilian Schmitt, Florian Eyben, Björn Schuller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

In this work, we present a new view on automatic speaker diarisation, i. e., assessing "who speaks when", based on the recognition of speaker traits such as age, gender, voice likability, and personality. Traditionally, speaker diarisation is accomplished using low-level audio descriptors (e. g., cepstral or spectral features), neglecting the fact that speakers can be well discriminated by humans according to various perceived characteristics. Thus, we advocate a novel paralinguistic approach that combines speaker diarisation with speaker characterisation by automatically identifying the speakers according to their individual traits. In a three-tier processing flow, speaker segmentation by voice activity detection (VAD) is initially performed to detect speaker turns. Next, speaker attributes are predicted using pre-trained paralinguistic models. To tag the speakers, clustering algorithms are applied to the predicted traits. We evaluate our methods against state-of-the-art open source and commercial systems on a corpus of realistic, spontaneous dyadic conversations recorded in the wild from three different cultures (Chinese, English, German). Our results provide clear evidence that using paralinguistic features for speaker diarisation is a promising avenue of research.

Original languageEnglish
Title of host publicationMM 2017 - Proceedings of the 2017 ACM Multimedia Conference
PublisherAssociation for Computing Machinery, Inc
Pages387-392
Number of pages6
ISBN (Electronic)9781450349062
DOIs
StatePublished - 23 Oct 2017
Externally publishedYes
Event25th ACM International Conference on Multimedia, MM 2017 - Mountain View, United States
Duration: 23 Oct 201727 Oct 2017

Publication series

NameMM 2017 - Proceedings of the 2017 ACM Multimedia Conference

Conference

Conference25th ACM International Conference on Multimedia, MM 2017
Country/TerritoryUnited States
CityMountain View
Period23/10/1727/10/17

Keywords

  • Computational paralinguistics
  • Speaker characteristics
  • Speaker diarisation
  • Speaker identification

Fingerprint

Dive into the research topics of 'A paralinguistic approach to speaker diarisation using age, gender, voice likability and personality traits'. Together they form a unique fingerprint.

Cite this