TY - JOUR
T1 - Medium-term speaker states - A review on intoxication, sleepiness and the first challenge
AU - Schuller, Björn
AU - Steidl, Stefan
AU - Batliner, Anton
AU - Schiel, Florian
AU - Krajewski, Jarek
AU - Weninger, Felix
AU - Eyben, Florian
PY - 2014/3
Y1 - 2014/3
N2 - In the emerging field of computational paralinguistics, most research efforts are devoted to either short-term speaker states such as emotions, or long-term traits such as personality, gender, or age. To bridge this gap on the time axis, and hence broaden the scope of the field, the INTERSPEECH 2011 Speaker State Challenge addressed the algorithmic analysis of medium-term speaker states: alcohol intoxication and sleepiness, both of which are highly relevant in high risk environments. Preserving the paradigms of the two previous INTERSPEECH Challenges, researchers were invited to participate in a large-scale evaluation providing unified testing conditions. This article reviews previous efforts to automatically recognise intoxication and sleepiness from speech signals, and gives an overview on the Challenge conditions and data sets, the methods used by the participants, and their results. By fusing participants' systems, we show that binary classification of alcoholisation and sleepiness from short-term observations, i.e.; single utterances, can both reach over 72% accuracy on unseen test data; furthermore, we demonstrate that these medium-term states can be recognised more robustly by fusing short-term classifiers along the time axis, reaching up to 91% accuracy for intoxication and 75% for sleepiness.
AB - In the emerging field of computational paralinguistics, most research efforts are devoted to either short-term speaker states such as emotions, or long-term traits such as personality, gender, or age. To bridge this gap on the time axis, and hence broaden the scope of the field, the INTERSPEECH 2011 Speaker State Challenge addressed the algorithmic analysis of medium-term speaker states: alcohol intoxication and sleepiness, both of which are highly relevant in high risk environments. Preserving the paradigms of the two previous INTERSPEECH Challenges, researchers were invited to participate in a large-scale evaluation providing unified testing conditions. This article reviews previous efforts to automatically recognise intoxication and sleepiness from speech signals, and gives an overview on the Challenge conditions and data sets, the methods used by the participants, and their results. By fusing participants' systems, we show that binary classification of alcoholisation and sleepiness from short-term observations, i.e.; single utterances, can both reach over 72% accuracy on unseen test data; furthermore, we demonstrate that these medium-term states can be recognised more robustly by fusing short-term classifiers along the time axis, reaching up to 91% accuracy for intoxication and 75% for sleepiness.
KW - Challenge
KW - Computational paralinguistics
KW - Intoxication
KW - Sleepiness
KW - Survey
UR - http://www.scopus.com/inward/record.url?scp=84890561011&partnerID=8YFLogxK
U2 - 10.1016/j.csl.2012.12.002
DO - 10.1016/j.csl.2012.12.002
M3 - Article
AN - SCOPUS:84890561011
SN - 0885-2308
VL - 28
SP - 346
EP - 374
JO - Computer Speech and Language
JF - Computer Speech and Language
IS - 2
ER -