TY - JOUR
T1 - Wastewater-based epidemiology
T2 - deriving a SARS-CoV-2 data validation method to assess data quality and to improve trend recognition
AU - Saravia, Cristina J.
AU - Pütz, Peter
AU - Wurzbacher, Christian
AU - Uchaikina, Anna
AU - Drewes, Jörg E.
AU - Braun, Ulrike
AU - Bannick, Claus Gerhard
AU - Obermaier, Nathan
N1 - Publisher Copyright:
Copyright © 2024 Saravia, Pütz, Wurzbacher, Uchaikina, Drewes, Braun, Bannick and Obermaier.
PY - 2024
Y1 - 2024
N2 - Introduction: Accurate and consistent data play a critical role in enabling health officials to make informed decisions regarding emerging trends in SARS-CoV-2 infections. Alongside traditional indicators such as the 7-day-incidence rate, wastewater-based epidemiology can provide valuable insights into SARS-CoV-2 concentration changes. However, the wastewater compositions and wastewater systems are rather complex. Multiple effects such as precipitation events or industrial discharges might affect the quantification of SARS-CoV-2 concentrations. Hence, analysing data from more than 150 wastewater treatment plants (WWTP) in Germany necessitates an automated and reliable method to evaluate data validity, identify potential extreme events, and, if possible, improve overall data quality. Methods: We developed a method that first categorises the data quality of WWTPs and corresponding laboratories based on the number of outliers in the reproduction rate as well as the number of implausible inflection points within the SARS-CoV-2 time series. Subsequently, we scrutinised statistical outliers in several standard quality control parameters (QCP) that are routinely collected during the analysis process such as the flow rate, the electrical conductivity, or surrogate viruses like the pepper mild mottle virus. Furthermore, we investigated outliers in the ratio of the analysed gene segments that might indicate laboratory errors. To evaluate the success of our method, we measure the degree of accordance between identified QCP outliers and outliers in the SARS-CoV-2 concentration curves. Results and discussion: Our analysis reveals that the flow and gene segment ratios are typically best at identifying outliers in the SARS-CoV-2 concentration curve albeit variations across WWTPs and laboratories. The exclusion of datapoints based on QCP plausibility checks predominantly improves data quality. Our derived data quality categories are in good accordance with visual assessments. Conclusion: Good data quality is crucial for trend recognition, both on the WWTP level and when aggregating data from several WWTPs to regional or national trends. Our model can help to improve data quality in the context of health-related monitoring and can be optimised for each individual WWTP to account for the large diversity among WWTPs.
AB - Introduction: Accurate and consistent data play a critical role in enabling health officials to make informed decisions regarding emerging trends in SARS-CoV-2 infections. Alongside traditional indicators such as the 7-day-incidence rate, wastewater-based epidemiology can provide valuable insights into SARS-CoV-2 concentration changes. However, the wastewater compositions and wastewater systems are rather complex. Multiple effects such as precipitation events or industrial discharges might affect the quantification of SARS-CoV-2 concentrations. Hence, analysing data from more than 150 wastewater treatment plants (WWTP) in Germany necessitates an automated and reliable method to evaluate data validity, identify potential extreme events, and, if possible, improve overall data quality. Methods: We developed a method that first categorises the data quality of WWTPs and corresponding laboratories based on the number of outliers in the reproduction rate as well as the number of implausible inflection points within the SARS-CoV-2 time series. Subsequently, we scrutinised statistical outliers in several standard quality control parameters (QCP) that are routinely collected during the analysis process such as the flow rate, the electrical conductivity, or surrogate viruses like the pepper mild mottle virus. Furthermore, we investigated outliers in the ratio of the analysed gene segments that might indicate laboratory errors. To evaluate the success of our method, we measure the degree of accordance between identified QCP outliers and outliers in the SARS-CoV-2 concentration curves. Results and discussion: Our analysis reveals that the flow and gene segment ratios are typically best at identifying outliers in the SARS-CoV-2 concentration curve albeit variations across WWTPs and laboratories. The exclusion of datapoints based on QCP plausibility checks predominantly improves data quality. Our derived data quality categories are in good accordance with visual assessments. Conclusion: Good data quality is crucial for trend recognition, both on the WWTP level and when aggregating data from several WWTPs to regional or national trends. Our model can help to improve data quality in the context of health-related monitoring and can be optimised for each individual WWTP to account for the large diversity among WWTPs.
KW - automated quality control
KW - data plausibility
KW - outlier detection
KW - SARS-CoV-2
KW - wastewater treatment plant classification
KW - wastewater-based epidemiology
UR - http://www.scopus.com/inward/record.url?scp=85213007013&partnerID=8YFLogxK
U2 - 10.3389/fpubh.2024.1497100
DO - 10.3389/fpubh.2024.1497100
M3 - Article
AN - SCOPUS:85213007013
SN - 2296-2565
VL - 12
JO - Frontiers in Public Health
JF - Frontiers in Public Health
M1 - 1497100
ER -