VCMNet: Weakly supervised learning for automatic infant vocalisation maturity analysis

Najla D. Al Futaisi, Zixing Zhang, Alejandrina Cristia, Anne S. Warlaumont, Björn W. Schuller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

Using neural networks to classify infant vocalisations into important subclasses (such as crying versus speech) is an emergent task in speech technology. One of the biggest roadblocks standing in the way of progress lies in the datasets: The performance of a learning model is affected by the labelling quality and size of the dataset used, and infant vocalisation datasets with good quality labels tend to be small. In this paper, we assess the performance of three models for infant VoCalisation Maturity (VCM) trained with a large dataset annotated automatically using a purpose-built classifier and a small dataset annotated by highly trained human coders. The two datasets are used in three different training strategies, whose performance is compared against a baseline model. The first training strategy investigates adversarial training, while the second exploits multi-task learning as the neural network trains on both datasets simultaneously. In the final strategy, we integrate adversarial training and multi-task learning. All of the training strategies outperform the baseline, with the adversarial training strategy yielding the best results on the development set.

Original languageEnglish
Title of host publicationICMI 2019 - Proceedings of the 2019 International Conference on Multimodal Interaction
EditorsWen Gao, Helen Mei Ling Meng, Matthew Turk, Susan R. Fussell, Bjorn Schuller, Bjorn Schuller, Yale Song, Kai Yu
PublisherAssociation for Computing Machinery, Inc
Pages205-209
Number of pages5
ISBN (Electronic)9781450368605
DOIs
StatePublished - 14 Oct 2019
Externally publishedYes
Event21st ACM International Conference on Multimodal Interaction, ICMI 2019 - Suzhou, China
Duration: 14 Oct 201918 Oct 2019

Publication series

NameICMI 2019 - Proceedings of the 2019 International Conference on Multimodal Interaction

Conference

Conference21st ACM International Conference on Multimodal Interaction, ICMI 2019
Country/TerritoryChina
CitySuzhou
Period14/10/1918/10/19

Keywords

  • Infant vocalisation
  • Prelinguistic analysis
  • Weakly supervised learning

Fingerprint

Dive into the research topics of 'VCMNet: Weakly supervised learning for automatic infant vocalisation maturity analysis'. Together they form a unique fingerprint.

Cite this