TY - JOUR
T1 - Introducing the COVID-19 YouTube (COVYT) speech dataset featuring the same speakers with and without infection
AU - Triantafyllopoulos, Andreas
AU - Semertzidou, Anastasia
AU - Song, Meishu
AU - Pokorny, Florian B.
AU - Schuller, Björn W.
N1 - Publisher Copyright:
© 2023
PY - 2024/2
Y1 - 2024/2
N2 - More than two years after its outbreak, the COVID-19 pandemic continues to plague medical systems around the world, putting a strain on scarce resources, and claiming human lives. From the very beginning, various AI-based COVID-19 detection and monitoring tools have been pursued in an attempt to stem the tide of infections through timely diagnosis. In particular, computer audition has been suggested as a non-invasive, cost-efficient, and eco-friendly alternative for detecting COVID-19 infections through vocal sounds. However, like all AI methods, also computer audition is heavily dependent on the quantity and quality of available data, and large-scale COVID-19 sound datasets are difficult to acquire – amongst other reasons – due to the sensitive nature of such data. To that end, we introduce the COVYT dataset – a novel COVID-19 dataset collected from public sources containing more than 8 h of speech from 65 speakers. As compared to other existing COVID-19 sound datasets, the unique feature of the COVYT dataset is that it comprises both COVID-19 positive and negative samples from all 65 speakers. We additionally provide an overview acoustic analysis and modelling baselines using different partitioning strategies. We analyse the acoustic manifestation of COVID-19 on the basis of these perfectly speaker characteristic balanced ‘in-the-wild’ data using interpretable audio descriptors, and investigate several classification scenarios that shed light into proper partitioning strategies for a fair speech-based COVID-19 detection.
AB - More than two years after its outbreak, the COVID-19 pandemic continues to plague medical systems around the world, putting a strain on scarce resources, and claiming human lives. From the very beginning, various AI-based COVID-19 detection and monitoring tools have been pursued in an attempt to stem the tide of infections through timely diagnosis. In particular, computer audition has been suggested as a non-invasive, cost-efficient, and eco-friendly alternative for detecting COVID-19 infections through vocal sounds. However, like all AI methods, also computer audition is heavily dependent on the quantity and quality of available data, and large-scale COVID-19 sound datasets are difficult to acquire – amongst other reasons – due to the sensitive nature of such data. To that end, we introduce the COVYT dataset – a novel COVID-19 dataset collected from public sources containing more than 8 h of speech from 65 speakers. As compared to other existing COVID-19 sound datasets, the unique feature of the COVYT dataset is that it comprises both COVID-19 positive and negative samples from all 65 speakers. We additionally provide an overview acoustic analysis and modelling baselines using different partitioning strategies. We analyse the acoustic manifestation of COVID-19 on the basis of these perfectly speaker characteristic balanced ‘in-the-wild’ data using interpretable audio descriptors, and investigate several classification scenarios that shed light into proper partitioning strategies for a fair speech-based COVID-19 detection.
KW - COVID-19
KW - Computer audition
KW - Disease detection
KW - Machine learning
KW - Speech dataset
KW - Speech pathology
UR - http://www.scopus.com/inward/record.url?scp=85175494016&partnerID=8YFLogxK
U2 - 10.1016/j.bspc.2023.105642
DO - 10.1016/j.bspc.2023.105642
M3 - Article
AN - SCOPUS:85175494016
SN - 1746-8094
VL - 88
JO - Biomedical Signal Processing and Control
JF - Biomedical Signal Processing and Control
M1 - 105642
ER -