Sleeved CoClustering

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    13 Scopus citations

    Abstract

    A coCluster of a m × n matrix X is a submatrix determined by a subset of the rows and a subset of the columns. The problem of finding coClusters with specific properties is of interest, in particular, in the analysis of microarray experiments. In that case the entries of the matrix X are the expression levels of m genes in each of n tissue samples. One goal of the analysis is to extract a subset of the samples and a subset of the genes, such that the expression levels of the chosen genes behave similarly across the subset of the samples, presumably reflecting an underlying regulatory mechanism governing the expression level of the genes. We propose to base the similarity of the genes in a coCluster on a simple biological model, in which the strength of the regulatory mechanism in sample j is Hj, and the response strength of gene i to the regulatory mechanism is Gi. In other words, every two genes participating in a good coCluster should have expression values in each of the participating samples, whose ratio is a constant depending only on the two genes. Noise in the expression levels of genes is taken into account by allowing a deviation from the model, measured by a relative error criterion. The sleeve-width of the coCluster reflects the extent to which entry i, j in the coCluster is allowed to deviate, relatively, from being expressed as the product GiHj. We present a polynomial-time Monte-Carlo algorithm which outputs a list of coClusters whose sleeve-widths do not exceed a prespecified value. Moreover, we prove that the list includes, with fixed probability, a coCluster which is near-optimal in its dimensions. Extensive experimentation with synthetic data shows that the algorithm performs well.

    Original languageEnglish
    Title of host publicationKDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    PublisherAssociation for Computing Machinery (ACM)
    Pages635-640
    Number of pages6
    ISBN (Print)1581138881, 9781581138887
    DOIs
    StatePublished - 1 Jan 2004
    EventKDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Seattle, WA, United States
    Duration: 22 Aug 200425 Aug 2004

    Publication series

    NameKDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

    Conference

    ConferenceKDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    Country/TerritoryUnited States
    CitySeattle, WA
    Period22/08/0425/08/04

    Keywords

    • Clustering
    • Co-regulation
    • Coclustering
    • Gene expression data

    ASJC Scopus subject areas

    • General Engineering

    Fingerprint

    Dive into the research topics of 'Sleeved CoClustering'. Together they form a unique fingerprint.

    Cite this