TY - GEN

T1 - Statistical Significance in High-dimensional Linear Mixed Models

AU - Lin, Lina

AU - Drton, Mathias

AU - Shojaie, Ali

N1 - Publisher Copyright:
© 2020 Owner/Author.

PY - 2020/10/19

Y1 - 2020/10/19

N2 - This paper develops an inferential framework for high-dimensional linear mixed effect models. Such models are suitable, e.g., when collecting n repeated measurements for M subjects. We consider a scenario where the number of fixed effects p is large (and may be larger than M), but the number of random effects q is small. Our framework is inspired by a recent line of work that proposes de-biasing penalized estimators to perform inference for high-dimensional linear models with fixed effects only. In particular, we demonstrate how to correct a 'naive' ridge estimator to build asymptotically valid confidence intervals for mixed effect models. We validate our theoretical results with numerical experiments that show that our method can successfully account for the correlation induced by the random effects. For a practical demonstration we consider a riboflavin production dataset that exhibits group structure, and show that conclusions drawn using our method are consistent with those obtained on a similar dataset without group structure.

AB - This paper develops an inferential framework for high-dimensional linear mixed effect models. Such models are suitable, e.g., when collecting n repeated measurements for M subjects. We consider a scenario where the number of fixed effects p is large (and may be larger than M), but the number of random effects q is small. Our framework is inspired by a recent line of work that proposes de-biasing penalized estimators to perform inference for high-dimensional linear models with fixed effects only. In particular, we demonstrate how to correct a 'naive' ridge estimator to build asymptotically valid confidence intervals for mixed effect models. We validate our theoretical results with numerical experiments that show that our method can successfully account for the correlation induced by the random effects. For a practical demonstration we consider a riboflavin production dataset that exhibits group structure, and show that conclusions drawn using our method are consistent with those obtained on a similar dataset without group structure.

KW - confidence intervals

KW - high-dimensional statistics

KW - linear mixed effect model

KW - variance component estimation

UR - http://www.scopus.com/inward/record.url?scp=85097000540&partnerID=8YFLogxK

U2 - 10.1145/3412815.3416883

DO - 10.1145/3412815.3416883

M3 - Conference contribution

AN - SCOPUS:85097000540

T3 - FODS 2020 - Proceedings of the 2020 ACM-IMS Foundations of Data Science Conference

SP - 171

EP - 181

BT - FODS 2020 - Proceedings of the 2020 ACM-IMS Foundations of Data Science Conference

PB - Association for Computing Machinery, Inc

T2 - 2020 ACM-IMS Foundations of Data Science Conference, FODS 2020

Y2 - 19 October 2020 through 20 October 2020

ER -