TY - JOUR
T1 - A large-scale and PCR-referenced vocal audio dataset for COVID-19
AU - Budd, Jobie
AU - Baker, Kieran
AU - Karoune, Emma
AU - Coppock, Harry
AU - Patel, Selina
AU - Payne, Richard
AU - Tendero Cañadas, Ana
AU - Titcomb, Alexander
AU - Hurley, David
AU - Egglestone, Sabrina
AU - Butler, Lorraine
AU - Mellor, Jonathon
AU - Nicholson, George
AU - Kiskin, Ivan
AU - Koutra, Vasiliki
AU - Jersakova, Radka
AU - McKendry, Rachel A.
AU - Diggle, Peter
AU - Richardson, Sylvia
AU - Schuller, Björn W.
AU - Gilmour, Steven
AU - Pigoli, Davide
AU - Roberts, Stephen
AU - Packham, Josef
AU - Thornley, Tracey
AU - Holmes, Chris
N1 - Publisher Copyright:
© The Author(s) 2024.
PY - 2024/12
Y1 - 2024/12
N2 - The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs, exhalations, and speech were collected in the ‘Speak up and help beat coronavirus’ digital survey alongside demographic, symptom and self-reported respiratory condition data. Digital survey submissions were linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,565 of 72,999 participants and 24,105 of 25,706 positive cases. Respiratory symptoms were reported by 45.6% of participants. This dataset has additional potential uses for bioacoustics research, with 11.3% participants self-reporting asthma, and 27.2% with linked influenza PCR test results.
AB - The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs, exhalations, and speech were collected in the ‘Speak up and help beat coronavirus’ digital survey alongside demographic, symptom and self-reported respiratory condition data. Digital survey submissions were linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,565 of 72,999 participants and 24,105 of 25,706 positive cases. Respiratory symptoms were reported by 45.6% of participants. This dataset has additional potential uses for bioacoustics research, with 11.3% participants self-reporting asthma, and 27.2% with linked influenza PCR test results.
UR - http://www.scopus.com/inward/record.url?scp=85197163576&partnerID=8YFLogxK
U2 - 10.1038/s41597-024-03492-w
DO - 10.1038/s41597-024-03492-w
M3 - Article
C2 - 38937483
AN - SCOPUS:85197163576
SN - 2052-4463
VL - 11
JO - Scientific Data
JF - Scientific Data
IS - 1
M1 - 700
ER -