Online Cluster Drift Detection for Novelty Detection in Data Streams

Shon Mendelson, Boaz Lerner

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

A major challenge in data stream applications is the change in the target variable over time in unexpected ways, a phenomenon called concept drift (CD). Another challenge is the emergence of novel classes, soliciting novelty detection (ND) by, e.g., one-class or semi-supervised classification. But, in online ND, these two challenges interfere with each other although they should be dealt with jointly. We present the cluster drift detection (CDD) algorithm that, using a single hyper-parameter, performs offline clustering to learn the diverse normal profile, and detects online whether a never-seen-before example is novel or normal using a multivariate statistical test. If it is normal, the CDD uses this example to update the normal-profile cluster, enabling continuous CD monitoring. Experimental results using popular real-world and synthetic data sets, as well as a precision agriculture data set of banana plants under water stress and a COVID-19 data set demonstrate that the CDD algorithm: 1) distinguishes between normal and novel concepts more accurately than state-of-the-art algorithms, 2) provides information about why specific novel concepts are misdetected, and 3) is more robust to the complexity, drift, and noise in the problem than other algorithms.

Original languageEnglish
Title of host publicationProceedings - 19th IEEE International Conference on Machine Learning and Applications, ICMLA 2020
EditorsM. Arif Wani, Feng Luo, Xiaolin Li, Dejing Dou, Francesco Bonchi
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages171-178
Number of pages8
ISBN (Electronic)9781728184708
DOIs
StatePublished - 1 Dec 2020
Event19th IEEE International Conference on Machine Learning and Applications, ICMLA 2020 - Virtual, Miami, United States
Duration: 14 Dec 202017 Dec 2020

Conference

Conference19th IEEE International Conference on Machine Learning and Applications, ICMLA 2020
Country/TerritoryUnited States
CityVirtual, Miami
Period14/12/2017/12/20

Keywords

  • Concept drift
  • Novelty detection
  • Streaming data

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Online Cluster Drift Detection for Novelty Detection in Data Streams'. Together they form a unique fingerprint.

Cite this