Online classification of nonstationary data streams

Research output: Contribution to journalArticlepeer-review

122 Scopus citations

Abstract

Most classification methods are based on the assumption that the data conforms to a stationary distribution. However, the real-world data is usually collected over certain periods of time, ranging from seconds to years, and ignoring possible changes in the underlying concept, also known as concept drift, may degrade the predictive performance of a classification model. Moreover, the computation time, the amount of required memory, and the model complexity may grow indefinitely with the continuous arrival of new training instances. This paper describes and evaluates OLIN, an online classification system, which dynamically adjusts the size of the training window and the number of new examples between model re-constructions to the current rate of concept drift. By using a fixed amount of computer resources, OLIN produces models, which have nearly the same accuracy as the ones that would be produced by periodically re-constructing the model from all accumulated instances. We evaluate the system performance on sample segments from two real-world streams of non-stationary data.

Original languageEnglish
Pages (from-to)129-147
Number of pages19
JournalIntelligent Data Analysis
Volume6
Issue number2
DOIs
StatePublished - 1 Jan 2002

Keywords

  • classification
  • concept drift
  • incremental learning
  • info-fuzzy networks
  • online learning

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Online classification of nonstationary data streams'. Together they form a unique fingerprint.

Cite this