Skip to main navigation Skip to search Skip to main content

Statistical biases due to anonymization evaluated in an open clinical dataset from COVID-19 patients

  • NAPKON Study Group
  • , NAPKON Use & Access Committee
  • , NAPKON Steering Committee
  • , NAPKON Study Site Group
  • , NAPKON Infrastructure Group
  • University of Cologne
  • Charite Universitätsmedizin Berlin
  • University of Würzburg
  • Universität Bielefeld
  • Charité – Universitätsmedizin Berlin
  • University Hospital Schleswig-Holstein
  • Ludwig-Maximilians-Universität München
  • Technical University of Munich
  • German Center for Lung Research (DZL)
  • Klinikum der J. W. Goethe-Universität
  • University Hospital Würzburg
  • Klinikum der Universität Regensburg und Medizinische Fakultät
  • Justus-Liebig-University Giessen
  • Cardio-Pulmonary Institute (CPI)
  • Partner Site Bonn-Cologne
  • University Medicine Essen
  • University Hospital of Essen
  • University Hospital Schleswig-Holstein
  • Lahn-Dill-Clinics
  • University Heart Center
  • Robert Koch Institut
  • Justus-Liebig-Universität Gießen
  • Giessen University Hospital
  • University Medical Center
  • University of Freiburg
  • Klinikum der Ruhr-Universität Bochum
  • Universitätsklinikum Tübingen
  • Christian-Albrechts-University of Kiel
  • University Hospital
  • Universität Oldenburg
  • Universitätsklinikum Münster
  • University Hospital Leipzig
  • University Medicine Greifswald
  • University Medical Center Hamburg-Eppendorf
  • Universitätsklinikum Erlangen
  • Universitätsklinikum Carl Gustav Carus Dresden
  • University Hospital Augsburg
  • Tropical Hospital Paul-Lechler-Krankenhaus
  • Saarland University Medical Center
  • Practice for general medicine Dr. Allerlei
  • Practice for general medicine Am Ebertplatz
  • Medical Faculty Mannheim
  • Medical Center for Hematology and Oncology Munich MVZ
  • Malteser Hospital St. Franziskus Hospital
  • Klinikum Dortmund
  • Hannover Medical School
  • Cnopf́sche Klinik
  • Partner Site Munich Heart Alliance
  • University Hospital of Cologne
  • Helmholtz Zentrum München German Research Center for Environmental Health

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

Anonymization has the potential to foster the sharing of medical data. State-of-the-art methods use mathematical models to modify data to reduce privacy risks. However, the degree of protection must be balanced against the impact on statistical properties. We studied an extreme case of this trade-off: the statistical validity of an open medical dataset based on the German National Pandemic Cohort Network (NAPKON), which was prepared for publication using a strong anonymization procedure. Descriptive statistics and results of regression analyses were compared before and after anonymization of multiple variants of the original dataset. Despite significant differences in value distributions, the statistical bias was found to be small in all cases. In the regression analyses, the median absolute deviations of the estimated adjusted odds ratios for different sample sizes ranged from 0.01 [minimum = 0, maximum = 0.58] to 0.52 [minimum = 0.25, maximum = 0.91]. Disproportionate impact on the statistical properties of data is a common argument against the use of anonymization. Our analysis demonstrates that anonymization can actually preserve validity of statistical results in relatively low-dimensional data.

Original languageEnglish
Article number776
JournalScientific Data
Volume9
Issue number1
DOIs
StatePublished - Dec 2022

Fingerprint

Dive into the research topics of 'Statistical biases due to anonymization evaluated in an open clinical dataset from COVID-19 patients'. Together they form a unique fingerprint.

Cite this