Co-clustering of lagged data

Eran Shaham, David Sarne, Boaz Ben-Moshe

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The paper focuses on mining clusters that are characterized by a lagged relationship between the data objects. We call such clusters lagged co-clusters. A lagged co-cluster of a matrix is a submatrix determined by a subset of rows and their corresponding lag over a subset of columns. Extracting such subsets (not necessarily successive) may reveal an underlying governing regulatory mechanism. Such a regulatory mechanism is quite common in real life settings. It appears in a variety of fields: meteorology, seismic activity, stock market behavior, neuronal brain activity, river flow and navigation, are but a limited list of examples. Mining such lagged co-clusters not only helps in understanding the relationship between objects in the domain, but assists in forecasting their future behavior. For most interesting variants of this problem, finding an optimal lagged co-cluster is an NP-complete problem. We present a polynomial-time Monte-Carlo algorithm for finding a set of lagged co-clusters whose error does not exceed a pre-specified value, which handles noise, anti-correlations, missing values, and overlapping patterns. Moreover, we prove that the list includes, with fixed probability, a lagged co-cluster which is optimal in its dimensions. The algorithm was extensively evaluated using various environments. First, artificial data, enabling the evaluation of specific, isolated properties of the algorithm. Secondly, real-world data, using river flow and topographic data, enabling the evaluation of the algorithm to efficiently mine relevant and coherent lagged co-clusters in environments that are temporal, i.e., time reading data, and non-temporal, respectively.

Original languageEnglish
Title of host publicationProceedings - 10th IEEE International Conference on Data Mining, ICDM 2010
Pages451-460
Number of pages10
DOIs
StatePublished - 1 Dec 2010
Externally publishedYes
Event10th IEEE International Conference on Data Mining, ICDM 2010 - Sydney, NSW, Australia
Duration: 14 Dec 201017 Dec 2010

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Conference

Conference10th IEEE International Conference on Data Mining, ICDM 2010
Country/TerritoryAustralia
CitySydney, NSW
Period14/12/1017/12/10

Keywords

  • Clustering
  • Co-clustering
  • Data mining
  • Lagged clustering
  • Time-lagged

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Co-clustering of lagged data'. Together they form a unique fingerprint.

Cite this