TY - JOUR
T1 - Limits and convergence properties of the sequentially Markovian coalescent
AU - Sellinger, Thibaut Paul Patrick
AU - Abu-Awad, Diala
AU - Tellier, Aurélien
N1 - Publisher Copyright:
© 2021 The Authors. Molecular Ecology Resources published by John Wiley & Sons Ltd.
PY - 2021/10
Y1 - 2021/10
N2 - Several methods based on the sequentially Markovian coalescent (SMC) make use of full genome sequence data from samples to infer population demographic history including past changes in population size, admixture, migration events and population structure. More recently, the original theoretical framework has been extended to allow the simultaneous estimation of population size changes along with other life history traits such as selfing or seed banking. The latter developments enhance the applicability of SMC methods to nonmodel species. Although convergence proofs have been given using simulated data in a few specific cases, an in-depth investigation of the limitations of SMC methods is lacking. In order to explore such limits, we first develop a tool inferring the best case convergence of SMC methods assuming the true underlying coalescent genealogies are known. This tool can be used to quantify the amount and type of information that can be confidently retrieved from given data sets prior to the analysis of the real data. Second, we assess the inference accuracy when the assumptions of SMC approaches are violated due to departures from the model, namely the presence of transposable elements, variable recombination and mutation rates along the sequence, and SNP calling errors. Third, we deliver a new interpretation of SMC methods by highlighting the importance of the transition matrix, which we argue can be used as a set of summary statistics in other statistical inference methods, uncoupling the SMC from hidden Markov models (HMMs). We finally offer recommendations to better apply SMC methods and build adequate data sets under budget constraints.
AB - Several methods based on the sequentially Markovian coalescent (SMC) make use of full genome sequence data from samples to infer population demographic history including past changes in population size, admixture, migration events and population structure. More recently, the original theoretical framework has been extended to allow the simultaneous estimation of population size changes along with other life history traits such as selfing or seed banking. The latter developments enhance the applicability of SMC methods to nonmodel species. Although convergence proofs have been given using simulated data in a few specific cases, an in-depth investigation of the limitations of SMC methods is lacking. In order to explore such limits, we first develop a tool inferring the best case convergence of SMC methods assuming the true underlying coalescent genealogies are known. This tool can be used to quantify the amount and type of information that can be confidently retrieved from given data sets prior to the analysis of the real data. Second, we assess the inference accuracy when the assumptions of SMC approaches are violated due to departures from the model, namely the presence of transposable elements, variable recombination and mutation rates along the sequence, and SNP calling errors. Third, we deliver a new interpretation of SMC methods by highlighting the importance of the transition matrix, which we argue can be used as a set of summary statistics in other statistical inference methods, uncoupling the SMC from hidden Markov models (HMMs). We finally offer recommendations to better apply SMC methods and build adequate data sets under budget constraints.
KW - ancestral recombination graph
KW - kingman coalescent
KW - population genetics
UR - http://www.scopus.com/inward/record.url?scp=85107006616&partnerID=8YFLogxK
U2 - 10.1111/1755-0998.13416
DO - 10.1111/1755-0998.13416
M3 - Article
C2 - 33978324
AN - SCOPUS:85107006616
SN - 1755-098X
VL - 21
SP - 2231
EP - 2248
JO - Molecular Ecology Resources
JF - Molecular Ecology Resources
IS - 7
ER -