Large-Scale Nonverbal Vocalization Detection Using Transformers

Panagiotis Tzirakis, Alice Baird, Jeffrey Brooks, Christopher Gagne, Lauren Kim, Michael Opara, Christopher Gregory, Jacob Metrick, Garrett Boseck, Vineet Tiruvadi, Bjorn Schuller, Dacher Keltner, Alan Cowen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations


Detecting emotionally expressive nonverbal vocalizations is essential to developing technologies that can converse fluently with humans. The affective computing community has largely focused on understanding the intonation of emotional speech and language. However, advances in the study of vocal emotional behavior suggest that emotions may be more readily conveyed not by speech but by nonverbal vocalizations such as laughs, sighs, shrieks, and grunts - vocalizations that often occur in lieu of speech. The task of detecting such emotional vocalizations has been largely overlooked by researchers, likely due to the limited availability of data capturing a sufficiently wide variety of vocalizations. Most studies in the literature focus on detecting laughter or cries. In this paper, we present the first, to the best of our knowledge, nonverbal vocalization detection model trained to detect as many as 67 types of emotional vocalizations. For our purposes, we use the large-scale and in-the-wild HUME-VB dataset that provides more than 156 h of data. We thoroughly investigate the use of pre-trained audio transformer models, such as Wav2Vec2 and Whisper, and provide useful insights for the task at hand using different types of noise signals.

Original languageEnglish
Title of host publicationICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728163277
StatePublished - 2023
Externally publishedYes
Event48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Greece
Duration: 4 Jun 202310 Jun 2023

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149


Conference48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
CityRhodes Island


  • Nonverbal vocalization
  • transformers
  • vo-cal burst detection


Dive into the research topics of 'Large-Scale Nonverbal Vocalization Detection Using Transformers'. Together they form a unique fingerprint.

Cite this