TY - JOUR
T1 - A framework for modeling epistatic interaction
AU - Blumenthal, David B.
AU - Baumbach, Jan
AU - Hoffmann, Markus
AU - Kacprowski, Tim
AU - List, Markus
N1 - Publisher Copyright:
© 2020 The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected].
PY - 2021/6/15
Y1 - 2021/6/15
N2 - Motivation: Recently, various tools for detecting single nucleotide polymorphisms (SNPs) involved in epistasis have been developed. However, no studies evaluate the employed statistical epistasis models such as the χ2-test or quadratic regression independently of the tools that use them. Such an independent evaluation is crucial for developing improved epistasis detection tools, for it allows to decide if a tool's performance should be attributed to the epistasis model or to the optimization strategy run on top of it. Results: We present a protocol for evaluating epistasis models independently of the tools they are used in and generalize existing models designed for dichotomous phenotypes to the categorical and quantitative case. In addition, we propose a new model which scores candidate SNP sets by computing maximum likelihood distributions for the observed phenotypes in the cells of their penetrance tables. Extensive experiments show that the proposed maximum likelihood model outperforms three widely used epistasis models in most cases. The experiments also provide valuable insights into the properties of existing models, for instance, that quadratic regression perform particularly well on instances with quantitative phenotypes.
AB - Motivation: Recently, various tools for detecting single nucleotide polymorphisms (SNPs) involved in epistasis have been developed. However, no studies evaluate the employed statistical epistasis models such as the χ2-test or quadratic regression independently of the tools that use them. Such an independent evaluation is crucial for developing improved epistasis detection tools, for it allows to decide if a tool's performance should be attributed to the epistasis model or to the optimization strategy run on top of it. Results: We present a protocol for evaluating epistasis models independently of the tools they are used in and generalize existing models designed for dichotomous phenotypes to the categorical and quantitative case. In addition, we propose a new model which scores candidate SNP sets by computing maximum likelihood distributions for the observed phenotypes in the cells of their penetrance tables. Extensive experiments show that the proposed maximum likelihood model outperforms three widely used epistasis models in most cases. The experiments also provide valuable insights into the properties of existing models, for instance, that quadratic regression perform particularly well on instances with quantitative phenotypes.
UR - http://www.scopus.com/inward/record.url?scp=85112127153&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btaa990
DO - 10.1093/bioinformatics/btaa990
M3 - Article
C2 - 33252645
AN - SCOPUS:85112127153
SN - 1367-4803
VL - 37
SP - 1708
EP - 1716
JO - Bioinformatics
JF - Bioinformatics
IS - 12
ER -