Abstract
A major challenge in data stream applications is the change in the target variable over time in unexpected ways, a phenomenon called concept drift (CD). Another challenge is the emergence of novel classes, soliciting novelty detection (ND) by, e.g., one-class or semi-supervised classification. But, in online ND, these two challenges interfere with each other although they should be dealt with jointly. We present the cluster drift detection (CDD) algorithm that, using a single hyper-parameter, performs offline clustering to learn the diverse normal profile, and detects online whether a never-seen-before example is novel or normal using a multivariate statistical test. If it is normal, the CDD uses this example to update the normal-profile cluster, enabling continuous CD monitoring. Experimental results using popular real-world and synthetic data sets, as well as a precision agriculture data set of banana plants under water stress and a COVID-19 data set demonstrate that the CDD algorithm: 1) distinguishes between normal and novel concepts more accurately than state-of-the-art algorithms, 2) provides information about why specific novel concepts are misdetected, and 3) is more robust to the complexity, drift, and noise in the problem than other algorithms.
Original language | English |
---|---|
Title of host publication | Proceedings - 19th IEEE International Conference on Machine Learning and Applications, ICMLA 2020 |
Editors | M. Arif Wani, Feng Luo, Xiaolin Li, Dejing Dou, Francesco Bonchi |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 171-178 |
Number of pages | 8 |
ISBN (Electronic) | 9781728184708 |
DOIs | |
State | Published - 1 Dec 2020 |
Event | 19th IEEE International Conference on Machine Learning and Applications, ICMLA 2020 - Virtual, Miami, United States Duration: 14 Dec 2020 → 17 Dec 2020 |
Conference
Conference | 19th IEEE International Conference on Machine Learning and Applications, ICMLA 2020 |
---|---|
Country/Territory | United States |
City | Virtual, Miami |
Period | 14/12/20 → 17/12/20 |
Keywords
- Concept drift
- Novelty detection
- Streaming data
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Science Applications
- Computer Vision and Pattern Recognition
- Hardware and Architecture