TY - JOUR
T1 - Two-sample hypothesis testing for inhomogeneous random graphs
AU - Ghoshdastidar, Debarghya
AU - Gutzeit, Maurilio
AU - Carpentier, Alexandra
AU - von Luxburg, Ulrike
N1 - Publisher Copyright:
© Institute of Mathematical Statistics, 2020.
PY - 2020/8
Y1 - 2020/8
N2 - The study of networks leads to a wide range of high-dimensional inference problems. In many practical applications, one needs to draw inference from one or few large sparse networks. The present paper studies hypothesis testing of graphs in this high-dimensional regime, where the goal is to test between two populations of inhomogeneous random graphs defined on the same set of n vertices. The size of each population m is much smaller than n, and can even be a constant as small as 1. The critical question in this context is whether the problem is solvable for small m. We answer this question from a minimax testing perspective. Let P, Q be the population adjacencies of two sparse inhomogeneous random graph models, and d be a suitably defined distance function. Given a population of m graphs from each model, we derive minimax separation rates for the problem of testing P = Q against d(P, Q) > ρ. We observe that if m is small, then the minimax separation is too large for some popular choices of d, including total variation distance between corresponding distributions. This implies that some models that are widely separated in d cannot be distinguished for small m, and hence, the testing problem is generally not solvable in these cases. We also show that if m > 1, then the minimax separation is relatively small if d is the Frobenius norm or operator norm distance between P and Q. For m = 1, only the latter distance provides small minimax separation. Thus, for these distances, the problem is solvable for small m. We also present near-optimal two-sample tests in both cases, where tests are adaptive with respect to sparsity level of the graphs.
AB - The study of networks leads to a wide range of high-dimensional inference problems. In many practical applications, one needs to draw inference from one or few large sparse networks. The present paper studies hypothesis testing of graphs in this high-dimensional regime, where the goal is to test between two populations of inhomogeneous random graphs defined on the same set of n vertices. The size of each population m is much smaller than n, and can even be a constant as small as 1. The critical question in this context is whether the problem is solvable for small m. We answer this question from a minimax testing perspective. Let P, Q be the population adjacencies of two sparse inhomogeneous random graph models, and d be a suitably defined distance function. Given a population of m graphs from each model, we derive minimax separation rates for the problem of testing P = Q against d(P, Q) > ρ. We observe that if m is small, then the minimax separation is too large for some popular choices of d, including total variation distance between corresponding distributions. This implies that some models that are widely separated in d cannot be distinguished for small m, and hence, the testing problem is generally not solvable in these cases. We also show that if m > 1, then the minimax separation is relatively small if d is the Frobenius norm or operator norm distance between P and Q. For m = 1, only the latter distance provides small minimax separation. Thus, for these distances, the problem is solvable for small m. We also present near-optimal two-sample tests in both cases, where tests are adaptive with respect to sparsity level of the graphs.
KW - Inhomogeneous erdos–Rényi model
KW - Minimax testing
KW - Two-sample test
UR - http://www.scopus.com/inward/record.url?scp=85090447184&partnerID=8YFLogxK
U2 - 10.1214/19-AOS1884
DO - 10.1214/19-AOS1884
M3 - Article
AN - SCOPUS:85090447184
SN - 0090-5364
VL - 48
SP - 2208
EP - 2229
JO - Annals of Statistics
JF - Annals of Statistics
IS - 4
ER -