Manual versus automated: The challenging routine of infant vocalisation segmentation in home videos to study neuro(mal)development

  • Florian B. Pokorny
  • , Robert Peharz
  • , Wolfgang Roth
  • , Matthias Zöhrer
  • , Franz Pernkopf
  • , Peter B. Marschik
  • , Björn W. Schuller

Research output: Contribution to journalConference articlepeer-review

12 Scopus citations

Abstract

In recent years, voice activity detection has been a highly researched field, due to its importance as input stage in many real-world applications. Automated detection of vocalisations in the very first year of life is still a stepchild of this field. On our quest defining acoustic parameters in pre-linguistic vocalisations as markers for neuro(mal)development, we are confronted with the challenge of manually segmenting and annotating hours of variable quality home video material for sequences of infant voice/vocalisations. While in total our corpus comprises video footage of typically developing infants and infants with various neurodevelopmental disorders of more than a year running time, only a small proportion has been processed so far. This calls for automated assistance tools for detecting and/or segmenting infant utterances from real-live video recordings. In this paper, we investigated several approaches of infant voice detection and segmentation, including a rule-based voice activity detector, hidden Markov models with Gaussian mixture observation models, support vector machines, and random forests. Results indicate that the applied methods could be well applied in a semi-automated retrieval of infant utterances from highly non-standardised footage. At the same time, our results show that, a fully automated approach for this problem is yet to come.

Original languageEnglish
Pages (from-to)2997-3001
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume08-12-September-2016
DOIs
StatePublished - 2016
Externally publishedYes
Event17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 - San Francisco, United States
Duration: 8 Sep 201616 Sep 2016

Keywords

  • Home video database
  • Infant vocalisation
  • Retrospective audio-video analysis
  • Voice activity detection

Fingerprint

Dive into the research topics of 'Manual versus automated: The challenging routine of infant vocalisation segmentation in home videos to study neuro(mal)development'. Together they form a unique fingerprint.

Cite this