TY - GEN
T1 - Estimating centrality statistics for complete and sampled networks
T2 - 48th Annual Hawaii International Conference on System Sciences, HICSS 2015
AU - Lee, Ju Sung
AU - Pfeffer, Juergen
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/3/26
Y1 - 2015/3/26
N2 - The study of large, 'big data' networks is becoming increasingly common and relevant to our understanding of human systems. Many of the studied networks are drawn from social media and other web-based sources. As such, in-depth analysis of these dynamic structures e.g. In the context of cyber security, remains especially challenging. Due to the time and resources incurred in computing network measures for large networks, it is practical to approximate these whenever possible. We present some approximation techniques exploiting any tractable relationship between the measures and network characteristics such as size and density. We find there exist distinct functional relationships between network statistics of complex 'slow' measures and 'fast' measures, such as the linkage between betweenness centrality and network density. We also track how these relationships scale with network size. Specifically, we explore the efficacy of both linear modeling (i.e., Correlations and least squares regression) and non-linear modeling in estimating the network measures of interest. We find that sparse, but not severely sparse, networks which admit sufficient entropy incur the most variance in the network statistics and, hence, more error in the estimation. We review our approaches with three prominent network topologies: random (aka Erdos-Renyi), Watts-Strogatz small-world, and scale-free networks. Finally, we assess how well the estimation approaches perform for sub-sampled networks.
AB - The study of large, 'big data' networks is becoming increasingly common and relevant to our understanding of human systems. Many of the studied networks are drawn from social media and other web-based sources. As such, in-depth analysis of these dynamic structures e.g. In the context of cyber security, remains especially challenging. Due to the time and resources incurred in computing network measures for large networks, it is practical to approximate these whenever possible. We present some approximation techniques exploiting any tractable relationship between the measures and network characteristics such as size and density. We find there exist distinct functional relationships between network statistics of complex 'slow' measures and 'fast' measures, such as the linkage between betweenness centrality and network density. We also track how these relationships scale with network size. Specifically, we explore the efficacy of both linear modeling (i.e., Correlations and least squares regression) and non-linear modeling in estimating the network measures of interest. We find that sparse, but not severely sparse, networks which admit sufficient entropy incur the most variance in the network statistics and, hence, more error in the estimation. We review our approaches with three prominent network topologies: random (aka Erdos-Renyi), Watts-Strogatz small-world, and scale-free networks. Finally, we assess how well the estimation approaches perform for sub-sampled networks.
KW - Graph typology
KW - Network analysis
KW - Sampling error
UR - http://www.scopus.com/inward/record.url?scp=84944202735&partnerID=8YFLogxK
U2 - 10.1109/HICSS.2015.203
DO - 10.1109/HICSS.2015.203
M3 - Conference contribution
AN - SCOPUS:84944202735
T3 - Proceedings of the Annual Hawaii International Conference on System Sciences
SP - 1686
EP - 1695
BT - Proceedings of the 48th Annual Hawaii International Conference on System Sciences, HICSS 2015
A2 - Bui, Tung X.
A2 - Sprague, Ralph H.
PB - IEEE Computer Society
Y2 - 5 January 2015 through 8 January 2015
ER -