TY - JOUR
T1 - Sensitivity to prior specification in Bayesian genome-based prediction models
AU - Lehermeier, Christina
AU - Wimmer, Valentin
AU - Albrecht, Theresa
AU - Auinger, Hans Jürgen
AU - Gianola, Daniel
AU - Schmid, Volker J.
AU - Schön, Chris Carolin
N1 - Funding Information:
Author notes: This research was funded by the German Federal Ministry of Education and Research (BMBF) within the AgroClustEr “Synbreed – Synergistic plant and animal breeding” (FKZ: 0315528A). This research was carried out with the support of the Technische Universität München – Institute for Advanced Study, funded by the German Excellence Initiative. We gratefully acknowledge the KWS SAAT AG for providing the experimental data.
PY - 2013/6
Y1 - 2013/6
N2 - Different statistical models have been proposed for maximizing prediction accuracy in genomebased prediction of breeding values in plant and animal breeding. However, little is known about the sensitivity of these models with respect to prior and hyperparameter specification, because comparisons of prediction performance are mainly based on a single set of hyperparameters. In this study, we focused on Bayesian prediction methods using a standard linear regression model with marker covariates coding additive effects at a large number of marker loci. By comparing different hyperparameter settings, we investigated the sensitivity of four methods frequently used in genome-based prediction (Bayesian Ridge, Bayesian Lasso, BayesA and BayesB) to specification of the prior distribution of marker effects. We used datasets simulated according to a typical maize breeding program differing in the number of markers and the number of simulated quantitative trait loci affecting the trait. Furthermore, we used an experimental maize dataset, comprising 698 doubled haploid lines, each genotyped with 56110 single nucleotide polymorphism markers and phenotyped as testcrosses for the two quantitative traits grain dry matter yield and grain dry matter content. The predictive ability of the different models was assessed by five-fold cross-validation. The extent of Bayesian learning was quantified by calculation of the Hellinger distance between the prior and posterior densities of marker effects. Our results indicate that similar predictive abilities can be achieved with all methods, but with BayesA and BayesB hyperparameter settings had a stronger effect on prediction performance than with the other two methods. Prediction performance of BayesA and BayesB suffered substantially from a nonoptimal choice of hyperparameters.
AB - Different statistical models have been proposed for maximizing prediction accuracy in genomebased prediction of breeding values in plant and animal breeding. However, little is known about the sensitivity of these models with respect to prior and hyperparameter specification, because comparisons of prediction performance are mainly based on a single set of hyperparameters. In this study, we focused on Bayesian prediction methods using a standard linear regression model with marker covariates coding additive effects at a large number of marker loci. By comparing different hyperparameter settings, we investigated the sensitivity of four methods frequently used in genome-based prediction (Bayesian Ridge, Bayesian Lasso, BayesA and BayesB) to specification of the prior distribution of marker effects. We used datasets simulated according to a typical maize breeding program differing in the number of markers and the number of simulated quantitative trait loci affecting the trait. Furthermore, we used an experimental maize dataset, comprising 698 doubled haploid lines, each genotyped with 56110 single nucleotide polymorphism markers and phenotyped as testcrosses for the two quantitative traits grain dry matter yield and grain dry matter content. The predictive ability of the different models was assessed by five-fold cross-validation. The extent of Bayesian learning was quantified by calculation of the Hellinger distance between the prior and posterior densities of marker effects. Our results indicate that similar predictive abilities can be achieved with all methods, but with BayesA and BayesB hyperparameter settings had a stronger effect on prediction performance than with the other two methods. Prediction performance of BayesA and BayesB suffered substantially from a nonoptimal choice of hyperparameters.
KW - Bayesian learning
KW - Genome-based prediction
KW - Genomic selection
KW - Plant breeding
KW - Shrinkage prior
UR - http://www.scopus.com/inward/record.url?scp=84879746501&partnerID=8YFLogxK
U2 - 10.1515/sagmb-2012-0042
DO - 10.1515/sagmb-2012-0042
M3 - Article
C2 - 23629460
AN - SCOPUS:84879746501
SN - 1544-6115
VL - 12
SP - 375
EP - 391
JO - Statistical Applications in Genetics and Molecular Biology
JF - Statistical Applications in Genetics and Molecular Biology
IS - 3
ER -