TY - JOUR
T1 - The effect of sample size on polygenic hazard models for prostate cancer
AU - Australian Prostate Cancer BioResource (APCB)
AU - The PRACTICAL Consortium
AU - Karunamuni, Roshan A.
AU - Huynh-Le, Minh Phuong
AU - Fan, Chun C.
AU - Eeles, Rosalind A.
AU - Easton, Douglas F.
AU - Kote-Jarai, ZSofia S.
AU - Amin Al Olama, Ali
AU - Benlloch Garcia, Sara
AU - Muir, Kenneth
AU - Gronberg, Henrik
AU - Wiklund, Fredrik
AU - Aly, Markus
AU - Schleutker, Johanna
AU - Sipeky, Csilla
AU - Tammela, Teuvo L.J.
AU - Nordestgaard, Børge G.
AU - Key, Tim J.
AU - Travis, Ruth C.
AU - Neal, David E.
AU - Donovan, Jenny L.
AU - Hamdy, Freddie C.
AU - Pharoah, Paul
AU - Pashayan, Nora
AU - Khaw, Kay Tee
AU - Thibodeau, Stephen N.
AU - McDonnell, Shannon K.
AU - Schaid, Daniel J.
AU - Maier, Christiane
AU - Vogel, Walther
AU - Luedeke, Manuel
AU - Herkommer, Kathleen
AU - Kibel, Adam S.
AU - Cybulski, Cezary
AU - Wokolorczyk, Dominika
AU - Kluzniak, Wojciech
AU - Cannon-Albright, Lisa
AU - Brenner, Hermann
AU - Schöttker, Ben
AU - Holleczek, Bernd
AU - Park, Jong Y.
AU - Sellers, Thomas A.
AU - Lin, Hui Yi
AU - Slavov, Chavdar
AU - Kaneva, Radka
AU - Mitev, Vanio
AU - Batra, Jyotsna
AU - Clements, Judith A.
AU - Spurdle, Amanda
AU - Teixeira, Manuel R.
AU - Paulo, Paula
N1 - Publisher Copyright:
© 2020, The Author(s), under exclusive licence to European Society of Human Genetics.
PY - 2020/10/1
Y1 - 2020/10/1
N2 - We determined the effect of sample size on performance of polygenic hazard score (PHS) models in prostate cancer. Age and genotypes were obtained for 40,861 men from the PRACTICAL consortium. The dataset included 201,590 SNPs per subject, and was split into training and testing sets. Established-SNP models considered 65 SNPs that had been previously associated with prostate cancer. Discovery-SNP models used stepwise selection to identify new SNPs. The performance of each PHS model was calculated for random sizes of the training set. The performance of a representative Established-SNP model was estimated for random sizes of the testing set. Mean HR98/50 (hazard ratio of top 2% to average in test set) of the Established-SNP model increased from 1.73 [95% CI: 1.69–1.77] to 2.41 [2.40–2.43] when the number of training samples was increased from 1 thousand to 30 thousand. Corresponding HR98/50 of the Discovery-SNP model increased from 1.05 [0.93–1.18] to 2.19 [2.16–2.23]. HR98/50 of a representative Established-SNP model using testing set sample sizes of 0.6 thousand and 6 thousand observations were 1.78 [1.70–1.85] and 1.73 [1.71–1.76], respectively. We estimate that a study population of 20 thousand men is required to develop Discovery-SNP PHS models while 10 thousand men should be sufficient for Established-SNP models.
AB - We determined the effect of sample size on performance of polygenic hazard score (PHS) models in prostate cancer. Age and genotypes were obtained for 40,861 men from the PRACTICAL consortium. The dataset included 201,590 SNPs per subject, and was split into training and testing sets. Established-SNP models considered 65 SNPs that had been previously associated with prostate cancer. Discovery-SNP models used stepwise selection to identify new SNPs. The performance of each PHS model was calculated for random sizes of the training set. The performance of a representative Established-SNP model was estimated for random sizes of the testing set. Mean HR98/50 (hazard ratio of top 2% to average in test set) of the Established-SNP model increased from 1.73 [95% CI: 1.69–1.77] to 2.41 [2.40–2.43] when the number of training samples was increased from 1 thousand to 30 thousand. Corresponding HR98/50 of the Discovery-SNP model increased from 1.05 [0.93–1.18] to 2.19 [2.16–2.23]. HR98/50 of a representative Established-SNP model using testing set sample sizes of 0.6 thousand and 6 thousand observations were 1.78 [1.70–1.85] and 1.73 [1.71–1.76], respectively. We estimate that a study population of 20 thousand men is required to develop Discovery-SNP PHS models while 10 thousand men should be sufficient for Established-SNP models.
UR - http://www.scopus.com/inward/record.url?scp=85086170171&partnerID=8YFLogxK
U2 - 10.1038/s41431-020-0664-2
DO - 10.1038/s41431-020-0664-2
M3 - Article
C2 - 32514134
AN - SCOPUS:85086170171
SN - 1018-4813
VL - 28
SP - 1467
EP - 1475
JO - European Journal of Human Genetics
JF - European Journal of Human Genetics
IS - 10
ER -