Improving recognition of speaker states and traits by cumulative evidence: Intoxication, sleepiness, age and gender

Felix Weninger, Erik Marchi, Björn Schuller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

We address the fully automatic recognition of intoxication, sleepiness, age and gender from speech in medium-term observation intervals of up to several minutes. The nature of these speaker states and traits as being medium-term or long-term, as opposed to short-term states such as emotion, makes it possible to collect cumulative evidence in the form of utterance level decisions; we show that by fusing these decisions along the time axis, more and more accurate decisions can be obtained. In extensive test runs on three official INTERSPEECH Challenge corpora, we show that the average recall can be improved by up to 5 %, 6 %, 10% and 11% absolute by longer-term observation of speaker sleepiness, gender, intoxication, and age, respectively, compared to the accuracy of a decision from a single utterance.

Original languageEnglish
Title of host publication13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Pages1158-1161
Number of pages4
StatePublished - 2012
Event13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 - Portland, OR, United States
Duration: 9 Sep 201213 Sep 2012

Publication series

Name13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Volume2

Conference

Conference13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Country/TerritoryUnited States
CityPortland, OR
Period9/09/1213/09/12

Fingerprint

Dive into the research topics of 'Improving recognition of speaker states and traits by cumulative evidence: Intoxication, sleepiness, age and gender'. Together they form a unique fingerprint.

Cite this