Classification-driven temporal discretization of multivariate time series

Research output: Contribution to journalArticlepeer-review

91 Scopus citations

Abstract

Biomedical data, in particular electronic medical records data, include a large number of variables sampled in irregular fashion, often including both time point and time intervals, thus providing several challenges for analysis and data mining. Classification of multivariate time series data is a challenging task, but is often necessary for medical care or research. Increasingly, temporal abstraction, in which a series of raw-data time points is abstracted into a set of symbolic time intervals, is being used for classification of multivariate time series. In this paper, we introduce a novel supervised discretization method, geared towards enhancement of classification accuracy, which determines the cutoffs that will best discriminate among classes through the distribution of their states. We present a framework for classification of multivariate time series analysis, which implements three phases: (1) application of a temporal-abstraction process that transforms a series of raw time-stamped data points into a series of symbolic time intervals (based on either unsupervised or supervised temporal abstraction); (2) mining these time intervals to discover frequent temporal-interval relation patterns (TIRPs), using versions of Allen’s 13 temporal relations; (3) using the patterns as features to induce a classifier. We evaluated the framework, focusing on the comparison of three versions of the new, supervised, temporal discretization for classification (TD4C) method, each relying on a different symbolic-state distribution-distance measure among outcome classes, to several commonly used unsupervised methods, on real datasets in the domains of diabetes, intensive care, and infectious hepatitis. Using only three abstract temporal relations resulted in a better classification performance than using Allen’s seven relations, especially when using three symbolic states per variable. Similarly when using the horizontal support and mean duration as the TIRPs feature representation, rather than a binary (existence) representation. The classification performance when using the three versions of TD4C was superior to the performance when using the unsupervised (EWD, SAX, and KB) discretization methods.

Original languageEnglish
Pages (from-to)871-913
JournalData Mining and Knowledge Discovery
Volume29
Issue number4
DOIs
StatePublished - 2 Oct 2015

Keywords

  • Classification
  • Discretization
  • Frequent pattern mining
  • Temporal abstraction
  • Temporal data mining
  • Temporal knowledge discovery
  • Time intervals mining

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Classification-driven temporal discretization of multivariate time series'. Together they form a unique fingerprint.

Cite this