TY - GEN
T1 - Bayesian fuzzy clustering of colored graphs
AU - Theis, Fabian J.
PY - 2012
Y1 - 2012
N2 - With the increasing availability of interaction data stemming form fields as diverse as systems biology, telecommunication or social sciences, the task of mining and understanding the underlying graph structures becomes more and more important. Here we focus on data with different types of nodes; we subsume this meta information in the color of a node. An important first step is the unsupervised clustering of nodes into communities, which are of the same color and highly connected within but sparsely connected to the rest of the graph. Recently we have proposed a fuzzy extension of this clustering concept, which allows a node to have membership in multiple clusters. The resulting gradient descent algorithm shared many similarities with the multiplicative update rules from nonnegative matrix factorization. Two issues left open were the determination of the number of clusters of each color, as well as the non-defined integration of additional prior information. In this contribution we resolve these issues by reinterpreting the factorization in a Bayesian framework, which allows the ready inclusion of priors. We integrate automatic relevance determination to automatically estimate group sizes. We derive a maximum-a-posteriori estimator, and illustrate the feasibility of the approach on a toy as well as a protein-complex hypergraph, where the resulting fuzzy clusters show significant enrichment of distinct gene ontology categories.
AB - With the increasing availability of interaction data stemming form fields as diverse as systems biology, telecommunication or social sciences, the task of mining and understanding the underlying graph structures becomes more and more important. Here we focus on data with different types of nodes; we subsume this meta information in the color of a node. An important first step is the unsupervised clustering of nodes into communities, which are of the same color and highly connected within but sparsely connected to the rest of the graph. Recently we have proposed a fuzzy extension of this clustering concept, which allows a node to have membership in multiple clusters. The resulting gradient descent algorithm shared many similarities with the multiplicative update rules from nonnegative matrix factorization. Two issues left open were the determination of the number of clusters of each color, as well as the non-defined integration of additional prior information. In this contribution we resolve these issues by reinterpreting the factorization in a Bayesian framework, which allows the ready inclusion of priors. We integrate automatic relevance determination to automatically estimate group sizes. We derive a maximum-a-posteriori estimator, and illustrate the feasibility of the approach on a toy as well as a protein-complex hypergraph, where the resulting fuzzy clusters show significant enrichment of distinct gene ontology categories.
UR - http://www.scopus.com/inward/record.url?scp=84857302647&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-28551-6_65
DO - 10.1007/978-3-642-28551-6_65
M3 - Conference contribution
AN - SCOPUS:84857302647
SN - 9783642285509
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 528
EP - 535
BT - Latent Variable Analysis and Signal Separation - 10th International Conference, LVA/ICA 2012, Proceedings
T2 - 10th International Conference on Latent Variable Analysis and Signal Separation, LVA/ICA 2012
Y2 - 12 March 2012 through 15 March 2012
ER -