TY - GEN
T1 - Efficient large-scale bicluster editing
AU - Sun, Peng
AU - Baumbach, Jan
AU - Guo, Jiong
PY - 2014
Y1 - 2014
N2 - The explosion of the biological data has dramatically reformed today's biological research. The need to integrate and analyze high-dimensional biological data on a large scale is driving the development of novel bioinformatics approaches. Biclustering, also known as simultaneous clustering or co-clustering, has been successfully utilized to discover local patterns in gene expression data and similar biomedical data types. Here, we contribute a new approach: Bi-Force. It is based on the weighted bicluster editing model, to perform biclustering on arbitrary sets of biological entities, given any kind of similarity function. We first evaluated the power of Bi-Force to solve dedicated bicluster editing problems by comparing Bi-Force with two existing algorithms in the BiCluE software package. We then followed a biclustering evaluation protocol from a recent review paper from Eren et al. and compared Bi-Force against eight existing tools: FABIA, QUBIC, Cheng and Church, Plaid, Bimax, Spectral, xMOTIFS and ISA. To this end, a suite of synthetic data sets as well as nine large gene expression data sets from Gene Expression Omnibus were analyzed. All resulting biclusters were subsequently investigated by Gene Ontology enrichment analysis to evaluate their biological relevance. The distinct theoretical foundation of Bi-Force (bicluster editing) is more powerful than strict biclustering. We thus outperformed existing tools with Bi-Force at least when following the evaluation protocols from Eren et al. Bi-Force is implemented in Java and integrated into the open source software package of BiCluE. The software as well as all used data sets are publicly available at http://biclue.mpi-inf.mpg.de.
AB - The explosion of the biological data has dramatically reformed today's biological research. The need to integrate and analyze high-dimensional biological data on a large scale is driving the development of novel bioinformatics approaches. Biclustering, also known as simultaneous clustering or co-clustering, has been successfully utilized to discover local patterns in gene expression data and similar biomedical data types. Here, we contribute a new approach: Bi-Force. It is based on the weighted bicluster editing model, to perform biclustering on arbitrary sets of biological entities, given any kind of similarity function. We first evaluated the power of Bi-Force to solve dedicated bicluster editing problems by comparing Bi-Force with two existing algorithms in the BiCluE software package. We then followed a biclustering evaluation protocol from a recent review paper from Eren et al. and compared Bi-Force against eight existing tools: FABIA, QUBIC, Cheng and Church, Plaid, Bimax, Spectral, xMOTIFS and ISA. To this end, a suite of synthetic data sets as well as nine large gene expression data sets from Gene Expression Omnibus were analyzed. All resulting biclusters were subsequently investigated by Gene Ontology enrichment analysis to evaluate their biological relevance. The distinct theoretical foundation of Bi-Force (bicluster editing) is more powerful than strict biclustering. We thus outperformed existing tools with Bi-Force at least when following the evaluation protocols from Eren et al. Bi-Force is implemented in Java and integrated into the open source software package of BiCluE. The software as well as all used data sets are publicly available at http://biclue.mpi-inf.mpg.de.
UR - https://www.scopus.com/pages/publications/84919338891
M3 - Conference contribution
AN - SCOPUS:84919338891
T3 - Lecture Notes in Informatics (LNI), Proceedings - Series of the Gesellschaft fur Informatik (GI)
SP - 54
EP - 60
BT - German Conference on Bioinformatics 2014
A2 - Giegerich, Robert
A2 - Hofestadt, Ralf
A2 - Nattkemper, Tim W.
PB - Gesellschaft fur Informatik (GI)
T2 - International Conference on German Conference on Bioinformatics, GCB 2014
Y2 - 28 September 2014 through 1 October 2014
ER -