TY - GEN
T1 - Automatic taxonomy extraction from bipartite graphs
AU - Kötter, Tobias
AU - Günnemann, Stephan
AU - Berthold, Michael R.
AU - Faloutsos, Christos
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2016/1/5
Y1 - 2016/1/5
N2 - Given a large bipartite graph that represents objects and their properties, how can we automatically extract semantic information that provides an overview of the data and - at the same time - enables us to drill down to specific parts for an in-depth analysis? In this work, we propose extracting a taxonomy that models the relation between the properties via an is a hierarchy. The extracted taxonomy arranges the properties from general to specific providing different levels of abstraction. Our proposed method has the following desirable properties: (a) it requires no user-defined parameters, by exploiting the principle of minimum description length, (b) it is effective, by utilizing the inheritance of objects when representing the hierarchy, and (c) it is scalable, being linear in the number of edges. We demonstrate the effectiveness and scalability of our method on a broad spectrum of real, publicly available graphs from drug-property graphs to social networks with up to 22 million vertices and 286 million edges.
AB - Given a large bipartite graph that represents objects and their properties, how can we automatically extract semantic information that provides an overview of the data and - at the same time - enables us to drill down to specific parts for an in-depth analysis? In this work, we propose extracting a taxonomy that models the relation between the properties via an is a hierarchy. The extracted taxonomy arranges the properties from general to specific providing different levels of abstraction. Our proposed method has the following desirable properties: (a) it requires no user-defined parameters, by exploiting the principle of minimum description length, (b) it is effective, by utilizing the inheritance of objects when representing the hierarchy, and (c) it is scalable, being linear in the number of edges. We demonstrate the effectiveness and scalability of our method on a broad spectrum of real, publicly available graphs from drug-property graphs to social networks with up to 22 million vertices and 286 million edges.
KW - Graphs
KW - MDL
KW - Taxonomies
UR - http://www.scopus.com/inward/record.url?scp=84963623621&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2015.24
DO - 10.1109/ICDM.2015.24
M3 - Conference contribution
AN - SCOPUS:84963623621
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 221
EP - 230
BT - Proceedings - 15th IEEE International Conference on Data Mining, ICDM 2015
A2 - Aggarwal, Charu
A2 - Zhou, Zhi-Hua
A2 - Tuzhilin, Alexander
A2 - Xiong, Hui
A2 - Wu, Xindong
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 15th IEEE International Conference on Data Mining, ICDM 2015
Y2 - 14 November 2015 through 17 November 2015
ER -