TY - JOUR
T1 - Establishing the reliability of metrics extracted from long-form recordings using LENA and the ACLEW pipeline
AU - Cristia, Alejandrina
AU - Gautheron, Lucas
AU - Zhang, Zixing
AU - Schuller, Björn
AU - Scaff, Camila
AU - Rowland, Caroline
AU - Räsänen, Okko
AU - Peurey, Loann
AU - Lavechin, Marvin
AU - Havard, William
AU - Fausey, Caitlin M.
AU - Cychosz, Margaret
AU - Bergelson, Elika
AU - Anderson, Heather
AU - Al Futaisi, Najla
AU - Soderstrom, Melanie
N1 - Publisher Copyright:
© The Psychonomic Society, Inc. 2024.
PY - 2024
Y1 - 2024
N2 - Long-form audio recordings are increasingly used to study individual variation, group differences, and many other topics in theoretical and applied fields of developmental science, particularly for the description of children’s language input (typically speech from adults) and children’s language output (ranging from babble to sentences). The proprietary LENA software has been available for over a decade, and with it, users have come to rely on derived metrics like adult word count (AWC) and child vocalization counts (CVC), which have also more recently been derived using an open-source alternative, the ACLEW pipeline. Yet, there is relatively little work assessing the reliability of long-form metrics in terms of the stability of individual differences across time. Filling this gap, we analyzed eight spoken-language datasets: four from North American English-learning infants, and one each from British English-, French-, American English-/Spanish-, and Quechua-/Spanish-learning infants. The audio data were analyzed using two types of processing software: LENA and the ACLEW open-source pipeline. When all corpora were included, we found relatively low to moderate reliability (across multiple recordings, intraclass correlation coefficient attributed to the child identity [Child ICC], was < 50% for most metrics). There were few differences between the two pipelines. Exploratory analyses suggested some differences as a function of child age and corpora. These findings suggest that, while reliability is likely sufficient for various group-level analyses, caution is needed when using either LENA or ACLEW tools to study individual variation. We also encourage improvement of extant tools, specifically targeting accurate measurement of individual variation.
AB - Long-form audio recordings are increasingly used to study individual variation, group differences, and many other topics in theoretical and applied fields of developmental science, particularly for the description of children’s language input (typically speech from adults) and children’s language output (ranging from babble to sentences). The proprietary LENA software has been available for over a decade, and with it, users have come to rely on derived metrics like adult word count (AWC) and child vocalization counts (CVC), which have also more recently been derived using an open-source alternative, the ACLEW pipeline. Yet, there is relatively little work assessing the reliability of long-form metrics in terms of the stability of individual differences across time. Filling this gap, we analyzed eight spoken-language datasets: four from North American English-learning infants, and one each from British English-, French-, American English-/Spanish-, and Quechua-/Spanish-learning infants. The audio data were analyzed using two types of processing software: LENA and the ACLEW open-source pipeline. When all corpora were included, we found relatively low to moderate reliability (across multiple recordings, intraclass correlation coefficient attributed to the child identity [Child ICC], was < 50% for most metrics). There were few differences between the two pipelines. Exploratory analyses suggested some differences as a function of child age and corpora. These findings suggest that, while reliability is likely sufficient for various group-level analyses, caution is needed when using either LENA or ACLEW tools to study individual variation. We also encourage improvement of extant tools, specifically targeting accurate measurement of individual variation.
KW - Accuracy
KW - Big data
KW - Daylong recordings
KW - Speech technology
UR - http://www.scopus.com/inward/record.url?scp=85204575860&partnerID=8YFLogxK
U2 - 10.3758/s13428-024-02493-2
DO - 10.3758/s13428-024-02493-2
M3 - Article
AN - SCOPUS:85204575860
SN - 1554-351X
JO - Behavior Research Methods
JF - Behavior Research Methods
ER -