TY - GEN
T1 - Quantifying Cognitive Load from Voice using Transformer-Based Models and a Cross-Dataset Evaluation
AU - Hecker, Pascal
AU - Kappattanavar, Arpita M.
AU - Schmitt, Maximilian
AU - Moontaha, Sidratul
AU - Wagner, Johannes
AU - Eyben, Florian
AU - Schuller, Björn W.
AU - Arnrich, Bert
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Cognitive load is frequently induced in laboratory setups to measure responses to stress, and its impact on voice has been studied in the field of computational paralinguistics. One dataset on this topic was provided in the Computational Paralinguistics Challenge (ComParE) 2014, and therefore offers great comparability. Recently, transformer-based deep learning architectures established a new state-of-the-art and are finding their way gradually into the audio domain. In this context, we investigate the performance of popular transformer architectures in the audio domain on the ComParE 2014 dataset, and the impact of different pre-training and fine-tuning setups on these models. Further, we recorded a small custom dataset, designed to be comparable with the ComParE 2014 one, to assess cross-corpus model generalisability. We find that the transformer models outperform the challenge baseline, the challenge winner, and more recent deep learning approaches. Models based on the 'large' architecture perform well on the task at hand, while models based on the 'base' architecture perform at chance level. Fine-tuning on related domains (such as ASR or emotion), before fine-tuning on the targets, yields no higher performance compared to models pre-trained only in a self-supervised manner. The generalisability of the models between datasets is more intricate than expected, as seen in an unexpected low performance on the small custom dataset, and we discuss potential 'hidden' underlying discrepancies between the datasets. In summary, transformer-based architectures outperform previous attempts to quantify cognitive load from voice. This is promising, in particular for healthcare-related problems in computational paralinguistics applications, since datasets are sparse in that realm.
AB - Cognitive load is frequently induced in laboratory setups to measure responses to stress, and its impact on voice has been studied in the field of computational paralinguistics. One dataset on this topic was provided in the Computational Paralinguistics Challenge (ComParE) 2014, and therefore offers great comparability. Recently, transformer-based deep learning architectures established a new state-of-the-art and are finding their way gradually into the audio domain. In this context, we investigate the performance of popular transformer architectures in the audio domain on the ComParE 2014 dataset, and the impact of different pre-training and fine-tuning setups on these models. Further, we recorded a small custom dataset, designed to be comparable with the ComParE 2014 one, to assess cross-corpus model generalisability. We find that the transformer models outperform the challenge baseline, the challenge winner, and more recent deep learning approaches. Models based on the 'large' architecture perform well on the task at hand, while models based on the 'base' architecture perform at chance level. Fine-tuning on related domains (such as ASR or emotion), before fine-tuning on the targets, yields no higher performance compared to models pre-trained only in a self-supervised manner. The generalisability of the models between datasets is more intricate than expected, as seen in an unexpected low performance on the small custom dataset, and we discuss potential 'hidden' underlying discrepancies between the datasets. In summary, transformer-based architectures outperform previous attempts to quantify cognitive load from voice. This is promising, in particular for healthcare-related problems in computational paralinguistics applications, since datasets are sparse in that realm.
KW - cognitive load
KW - cross-dataset
KW - transformer
KW - voice
KW - wav2vec 2.0
UR - http://www.scopus.com/inward/record.url?scp=85152215106&partnerID=8YFLogxK
U2 - 10.1109/ICMLA55696.2022.00055
DO - 10.1109/ICMLA55696.2022.00055
M3 - Conference contribution
AN - SCOPUS:85152215106
T3 - Proceedings - 21st IEEE International Conference on Machine Learning and Applications, ICMLA 2022
SP - 337
EP - 344
BT - Proceedings - 21st IEEE International Conference on Machine Learning and Applications, ICMLA 2022
A2 - Wani, M. Arif
A2 - Kantardzic, Mehmed
A2 - Palade, Vasile
A2 - Neagu, Daniel
A2 - Yang, Longzhi
A2 - Chan, Kit-Yan
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 21st IEEE International Conference on Machine Learning and Applications, ICMLA 2022
Y2 - 12 December 2022 through 14 December 2022
ER -