TY - JOUR

T1 - Two-sample hypothesis testing for inhomogeneous random graphs

AU - Ghoshdastidar, Debarghya

AU - Gutzeit, Maurilio

AU - Carpentier, Alexandra

AU - von Luxburg, Ulrike

N1 - Publisher Copyright:
© Institute of Mathematical Statistics, 2020.

PY - 2020/8

Y1 - 2020/8

N2 - The study of networks leads to a wide range of high-dimensional inference problems. In many practical applications, one needs to draw inference from one or few large sparse networks. The present paper studies hypothesis testing of graphs in this high-dimensional regime, where the goal is to test between two populations of inhomogeneous random graphs defined on the same set of n vertices. The size of each population m is much smaller than n, and can even be a constant as small as 1. The critical question in this context is whether the problem is solvable for small m. We answer this question from a minimax testing perspective. Let P, Q be the population adjacencies of two sparse inhomogeneous random graph models, and d be a suitably defined distance function. Given a population of m graphs from each model, we derive minimax separation rates for the problem of testing P = Q against d(P, Q) > ρ. We observe that if m is small, then the minimax separation is too large for some popular choices of d, including total variation distance between corresponding distributions. This implies that some models that are widely separated in d cannot be distinguished for small m, and hence, the testing problem is generally not solvable in these cases. We also show that if m > 1, then the minimax separation is relatively small if d is the Frobenius norm or operator norm distance between P and Q. For m = 1, only the latter distance provides small minimax separation. Thus, for these distances, the problem is solvable for small m. We also present near-optimal two-sample tests in both cases, where tests are adaptive with respect to sparsity level of the graphs.

AB - The study of networks leads to a wide range of high-dimensional inference problems. In many practical applications, one needs to draw inference from one or few large sparse networks. The present paper studies hypothesis testing of graphs in this high-dimensional regime, where the goal is to test between two populations of inhomogeneous random graphs defined on the same set of n vertices. The size of each population m is much smaller than n, and can even be a constant as small as 1. The critical question in this context is whether the problem is solvable for small m. We answer this question from a minimax testing perspective. Let P, Q be the population adjacencies of two sparse inhomogeneous random graph models, and d be a suitably defined distance function. Given a population of m graphs from each model, we derive minimax separation rates for the problem of testing P = Q against d(P, Q) > ρ. We observe that if m is small, then the minimax separation is too large for some popular choices of d, including total variation distance between corresponding distributions. This implies that some models that are widely separated in d cannot be distinguished for small m, and hence, the testing problem is generally not solvable in these cases. We also show that if m > 1, then the minimax separation is relatively small if d is the Frobenius norm or operator norm distance between P and Q. For m = 1, only the latter distance provides small minimax separation. Thus, for these distances, the problem is solvable for small m. We also present near-optimal two-sample tests in both cases, where tests are adaptive with respect to sparsity level of the graphs.

KW - Inhomogeneous erdos–Rényi model

KW - Minimax testing

KW - Two-sample test

UR - http://www.scopus.com/inward/record.url?scp=85090447184&partnerID=8YFLogxK

U2 - 10.1214/19-AOS1884

DO - 10.1214/19-AOS1884

M3 - Article

AN - SCOPUS:85090447184

SN - 0090-5364

VL - 48

SP - 2208

EP - 2229

JO - Annals of Statistics

JF - Annals of Statistics

IS - 4

ER -