Minimum sum-squared residue co-clustering of gene expression data

Hyuk Cho, Inderjit S. Dhillon, Yuqiang Guan, Suvrit Sra

Research output: Contribution to conferencePaperpeer-review

210 Scopus citations

Abstract

Microarray experiments have been extensively used for simultaneously measuring DNA expression levels of thousands of genes in genome research. A key step in the analysis of gene expression data is the clustering of genes into groups that show similar expression values over a range of conditions. Since only a small subset of the genes participate in any cellular process of interest, by focusing on subsets of genes and conditions, we can lower the noise induced by other genes and conditions - a co-cluster characterizes such a subset of interest. Cheng and Church [3] introduced an effective measure of co-cluster quality based on mean squared residue. In this paper, we use two similar squared residue measures and propose two fast k-means like co-clustering algorithms corresponding to the two residue measures. Our algorithms discover k row clusters and l column clusters simultaneously while monotonically decreasing the respective squared residues. Our co-clustering algorithms inherit the simplicity, efficiency and wide applicability of the k-means algorithm. Minimizing the residues may also be formulated as trace optimization problems that allow us to obtain a spectral relaxation that we use for a principled initialization for our iterative algorithms. We further enhance our algorithms by an incremental local search strategy that helps avoid empty clusters and escape poor local minima. We illustrate co-clustering results on a yeast cell cycle dataset and a human B-cell lymphoma dataset. Our experiments show that our co-clustering algorithms are efficient and are able to discover coherent co-clusters.

Original languageEnglish
Pages114-125
Number of pages12
DOIs
StatePublished - 2004
Externally publishedYes
EventProceedings of the Fourth SIAM International Conference on Data Mining - Lake Buena Vista, FL, United States
Duration: 22 Apr 200424 Apr 2004

Conference

ConferenceProceedings of the Fourth SIAM International Conference on Data Mining
Country/TerritoryUnited States
CityLake Buena Vista, FL
Period22/04/0424/04/04

Keywords

  • Biclustering
  • Co-clustering
  • Gene-expression
  • Residue
  • Spectral relaxation

Fingerprint

Dive into the research topics of 'Minimum sum-squared residue co-clustering of gene expression data'. Together they form a unique fingerprint.

Cite this