Sleeved CoClustering

Avraham A. Melkman, Eran Shaham

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Scopus citations

Abstract

A coCluster of a m × n matrix X is a submatrix determined by a subset of the rows and a subset of the columns. The problem of finding coClusters with specific properties is of interest, in particular, in the analysis of microarray experiments. In that case the entries of the matrix X are the expression levels of m genes in each of n tissue samples. One goal of the analysis is to extract a subset of the samples and a subset of the genes, such that the expression levels of the chosen genes behave similarly across the subset of the samples, presumably reflecting an underlying regulatory mechanism governing the expression level of the genes. We propose to base the similarity of the genes in a coCluster on a simple biological model, in which the strength of the regulatory mechanism in sample j is Hj, and the response strength of gene i to the regulatory mechanism is Gi. In other words, every two genes participating in a good coCluster should have expression values in each of the participating samples, whose ratio is a constant depending only on the two genes. Noise in the expression levels of genes is taken into account by allowing a deviation from the model, measured by a relative error criterion. The sleeve-width of the coCluster reflects the extent to which entry i, j in the coCluster is allowed to deviate, relatively, from being expressed as the product GiHj. We present a polynomial-time Monte-Carlo algorithm which outputs a list of coClusters whose sleeve-widths do not exceed a prespecified value. Moreover, we prove that the list includes, with fixed probability, a coCluster which is near-optimal in its dimensions. Extensive experimentation with synthetic data shows that the algorithm performs well.

Original languageEnglish
Title of host publicationKDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery (ACM)
Pages635-640
Number of pages6
ISBN (Print)1581138881, 9781581138887
DOIs
StatePublished - 1 Jan 2004
EventKDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Seattle, WA, United States
Duration: 22 Aug 200425 Aug 2004

Publication series

NameKDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

ConferenceKDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Country/TerritoryUnited States
CitySeattle, WA
Period22/08/0425/08/04

Keywords

  • Clustering
  • Co-regulation
  • Coclustering
  • Gene expression data

ASJC Scopus subject areas

  • Engineering (all)

Fingerprint

Dive into the research topics of 'Sleeved CoClustering'. Together they form a unique fingerprint.

Cite this