TY - JOUR
T1 - High-dimensional sparse vine copula regression with application to genomic prediction
AU - Sahin, Özge
AU - Czado, Claudia
N1 - Publisher Copyright:
© The Author(s) 2024. Published by Oxford University Press on behalf of The International Biometric Society. All rights reserved.
PY - 2024/3
Y1 - 2024/3
N2 - High-dimensional data sets are often available in genome-enabled predictions. Such data sets include nonlinear relationships with complex dependence structures. For such situations, vine copula-based (quantile) regression is an important tool. However, the current vine copula-based regression approaches do not scale up to high and ultra-high dimensions. To perform high-dimensional sparse vine copula-based regression, we propose 2 methods. First, we show their superiority regarding computational complexity over the existing methods. Second, we define relevant, irrelevant, and redundant explanatory variables for quantile regression. Then, we show our method’s power in selecting relevant variables and prediction accuracy in high-dimensional sparse data sets via simulation studies. Next, we apply the proposed methods to the high-dimensional real data, aiming at the genomic prediction of maize traits. Some data processing and feature extraction steps for the real data are further discussed. Finally, we show the advantage of our methods over linear models and quantile regression forests in simulation studies and real data applications.
AB - High-dimensional data sets are often available in genome-enabled predictions. Such data sets include nonlinear relationships with complex dependence structures. For such situations, vine copula-based (quantile) regression is an important tool. However, the current vine copula-based regression approaches do not scale up to high and ultra-high dimensions. To perform high-dimensional sparse vine copula-based regression, we propose 2 methods. First, we show their superiority regarding computational complexity over the existing methods. Second, we define relevant, irrelevant, and redundant explanatory variables for quantile regression. Then, we show our method’s power in selecting relevant variables and prediction accuracy in high-dimensional sparse data sets via simulation studies. Next, we apply the proposed methods to the high-dimensional real data, aiming at the genomic prediction of maize traits. Some data processing and feature extraction steps for the real data are further discussed. Finally, we show the advantage of our methods over linear models and quantile regression forests in simulation studies and real data applications.
KW - genomic prediction
KW - high-dimensional data
KW - quantile regression
KW - variable selection
KW - vine copula
UR - http://www.scopus.com/inward/record.url?scp=85187447984&partnerID=8YFLogxK
U2 - 10.1093/biomtc/ujad042
DO - 10.1093/biomtc/ujad042
M3 - Article
C2 - 38465987
AN - SCOPUS:85187447984
SN - 0006-341X
VL - 80
JO - Biometrics
JF - Biometrics
IS - 1
M1 - ujad042
ER -