Driver frustration detection from audio and video in the wild

Irman Abdíc, Lex Fridman, Daniel McDuff, Erik Marchi, Bryan Reimer, Björn Schuller

Research output: Contribution to journalConference articlepeer-review

20 Scopus citations

Abstract

We present a method for detecting driver frustration from both video and audio streams captured during the driver's interaction with an in-vehicle voice-based navigation system. The video is of the driver's face when the machine is speaking, and the audio is of the driver's voice when he or she is speaking. We analyze a dataset of 20 drivers that contains 596 audio epochs (audio clips, with duration from 1 sec to 15 sec) and 615 video epochs (video clips, with duration from 1 sec to 45 sec). The dataset is balanced across 2 age groups, 2 vehicle systems, and both genders. The model was subject-independently trained and tested using 4- fold cross-validation. We achieve an accuracy of 77.4% for detecting frustration from a single audio epoch and 81.2% for detecting frustration from a single video epoch. We then treat the video and audio epochs as a sequence of interactions and use decision fusion to characterize the trade-off between decision time and classification accuracy, which improved the prediction accuracy to 88.5% after 9 epochs.

Original languageEnglish
Pages (from-to)1354-1360
Number of pages7
JournalIJCAI International Joint Conference on Artificial Intelligence
Volume2016-January
StatePublished - 2016
Externally publishedYes
Event25th International Joint Conference on Artificial Intelligence, IJCAI 2016 - New York, United States
Duration: 9 Jul 201615 Jul 2016

Fingerprint

Dive into the research topics of 'Driver frustration detection from audio and video in the wild'. Together they form a unique fingerprint.

Cite this