Automatic speaker analysis 2.0: Hearing the bigger picture

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Automatic Speaker Analysis has largely focused on single aspects of a speaker such as her ID, gender, emotion, personality, or health state. This broadly ignores the interdependency of all the different states and traits impacting on the one single voice production mechanism available to a human speaker. In other words, sometimes we may sound depressed, but we simply have a flu, and hardly find the energy to put more vocal effort into our articulation and sound production. Recently, this lack gave rise to an increasingly holistic speaker analysis - assessing the 'larger picture' in one pass such as by multi-target learning. However, for a robust assessment, this requires large amount of speech and language resources labelled in rich ways to train such interdependency, and architectures able to cope with multi-target learning of massive amounts of speech data. In this light, this contribution will discuss efficient mechanisms such as large socialmedia pre-scanning with dynamic cooperative crowd-sourcing for rapid data collection, cross-task-labelling of these data in a wider range of attributes to reach 'big & rich' speech data, and efficient multi-target end-to-end and end-to-evolution deep learning paradigms to learn an accordingly rich representation of diverse target tasks in efficient ways. The ultimate goal behind is to enable machines to hear the 'entire' person and her condition and whereabouts behind the voice and words - rather than aiming at a single aspect blind to the overall individual and its state, thus leading to the next level of Automatic Speaker Analysis.

Original languageEnglish
Title of host publication2017 9th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781509064977
DOIs
StatePublished - 24 Jul 2017
Externally publishedYes
Event9th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2017 - Bucharest, Romania
Duration: 6 Jul 20179 Jul 2017

Publication series

Name2017 9th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2017

Conference

Conference9th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2017
Country/TerritoryRomania
CityBucharest
Period6/07/179/07/17

Fingerprint

Dive into the research topics of 'Automatic speaker analysis 2.0: Hearing the bigger picture'. Together they form a unique fingerprint.

Cite this