TY - JOUR
T1 - Manual versus automated
T2 - 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016
AU - Pokorny, Florian B.
AU - Peharz, Robert
AU - Roth, Wolfgang
AU - Zöhrer, Matthias
AU - Pernkopf, Franz
AU - Marschik, Peter B.
AU - Schuller, Björn W.
N1 - Publisher Copyright:
Copyright © 2016 ISCA.
PY - 2016
Y1 - 2016
N2 - In recent years, voice activity detection has been a highly researched field, due to its importance as input stage in many real-world applications. Automated detection of vocalisations in the very first year of life is still a stepchild of this field. On our quest defining acoustic parameters in pre-linguistic vocalisations as markers for neuro(mal)development, we are confronted with the challenge of manually segmenting and annotating hours of variable quality home video material for sequences of infant voice/vocalisations. While in total our corpus comprises video footage of typically developing infants and infants with various neurodevelopmental disorders of more than a year running time, only a small proportion has been processed so far. This calls for automated assistance tools for detecting and/or segmenting infant utterances from real-live video recordings. In this paper, we investigated several approaches of infant voice detection and segmentation, including a rule-based voice activity detector, hidden Markov models with Gaussian mixture observation models, support vector machines, and random forests. Results indicate that the applied methods could be well applied in a semi-automated retrieval of infant utterances from highly non-standardised footage. At the same time, our results show that, a fully automated approach for this problem is yet to come.
AB - In recent years, voice activity detection has been a highly researched field, due to its importance as input stage in many real-world applications. Automated detection of vocalisations in the very first year of life is still a stepchild of this field. On our quest defining acoustic parameters in pre-linguistic vocalisations as markers for neuro(mal)development, we are confronted with the challenge of manually segmenting and annotating hours of variable quality home video material for sequences of infant voice/vocalisations. While in total our corpus comprises video footage of typically developing infants and infants with various neurodevelopmental disorders of more than a year running time, only a small proportion has been processed so far. This calls for automated assistance tools for detecting and/or segmenting infant utterances from real-live video recordings. In this paper, we investigated several approaches of infant voice detection and segmentation, including a rule-based voice activity detector, hidden Markov models with Gaussian mixture observation models, support vector machines, and random forests. Results indicate that the applied methods could be well applied in a semi-automated retrieval of infant utterances from highly non-standardised footage. At the same time, our results show that, a fully automated approach for this problem is yet to come.
KW - Home video database
KW - Infant vocalisation
KW - Retrospective audio-video analysis
KW - Voice activity detection
UR - http://www.scopus.com/inward/record.url?scp=84994246442&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2016-1341
DO - 10.21437/Interspeech.2016-1341
M3 - Conference article
AN - SCOPUS:84994246442
SN - 2308-457X
VL - 08-12-September-2016
SP - 2997
EP - 3001
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Y2 - 8 September 2016 through 16 September 2016
ER -