TY - GEN
T1 - Quantifying confounding bias in neuroimaging datasets with causal inference
AU - Wachinger, Christian
AU - Becker, Benjamin Gutierrez
AU - Rieckmann, Anna
AU - Pölsterl, Sebastian
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2019.
PY - 2019
Y1 - 2019
N2 - Neuroimaging datasets keep growing in size to address increasingly complex medical questions. However, even the largest datasets today alone are too small for training complex machine learning models. A potential solution is to increase sample size by pooling scans from several datasets. In this work, we combine 12,207 MRI scans from 15 studies and show that simple pooling is often ill-advised due to introducing various types of biases in the training data. First, we systematically define these biases. Second, we detect bias by experimentally showing that scans can be correctly assigned to their respective dataset with 73.3% accuracy. Finally, we propose to tell causal from confounding factors by quantifying the extent of confounding and causality in a single dataset using causal inference. We achieve this by finding the simplest graphical model in terms of Kolmogorov complexity. As Kolmogorov complexity is not directly computable, we employ the minimum description length to approximate it. We empirically show that our approach is able to estimate plausible causal relationships from real neuroimaging data.
AB - Neuroimaging datasets keep growing in size to address increasingly complex medical questions. However, even the largest datasets today alone are too small for training complex machine learning models. A potential solution is to increase sample size by pooling scans from several datasets. In this work, we combine 12,207 MRI scans from 15 studies and show that simple pooling is often ill-advised due to introducing various types of biases in the training data. First, we systematically define these biases. Second, we detect bias by experimentally showing that scans can be correctly assigned to their respective dataset with 73.3% accuracy. Finally, we propose to tell causal from confounding factors by quantifying the extent of confounding and causality in a single dataset using causal inference. We achieve this by finding the simplest graphical model in terms of Kolmogorov complexity. As Kolmogorov complexity is not directly computable, we employ the minimum description length to approximate it. We empirically show that our approach is able to estimate plausible causal relationships from real neuroimaging data.
UR - http://www.scopus.com/inward/record.url?scp=85075645083&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-32251-9_53
DO - 10.1007/978-3-030-32251-9_53
M3 - Conference contribution
AN - SCOPUS:85075645083
SN - 9783030322502
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 484
EP - 492
BT - Medical Image Computing and Computer Assisted Intervention – MICCAI 2019 - 22nd International Conference, Proceedings
A2 - Shen, Dinggang
A2 - Yap, Pew-Thian
A2 - Liu, Tianming
A2 - Peters, Terry M.
A2 - Khan, Ali
A2 - Staib, Lawrence H.
A2 - Essert, Caroline
A2 - Zhou, Sean
PB - Springer Science and Business Media Deutschland GmbH
T2 - 22nd International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2019
Y2 - 13 October 2019 through 17 October 2019
ER -