Knowledge discovery in data streams with regression tree methods

Dima Alberg, Mark Last, Abraham Kandel

Research output: Contribution to journalReview articlepeer-review

33 Scopus citations

Abstract

This paper presents an advanced review of regression tree methods for mining data streams. Batch regression tree methods are known for their simplicity, interpretability, accuracy, and efficiency. They use fast divide-and-conquer greedy algorithms that recursively partition the given training data into smaller subsets. The result is a tree-shaped model with splitting rules in the internal nodes and predictions in the leaves. Most batch regression tree methods take a complete dataset and build a model using that data. Generally, this tree model cannot be modified if new data is acquired later. Their successors, the incremental model and interval trees algorithms, are able to build and retrain a model on a step-bystep basis by incorporating new numerical training instances into the model as they become available. Moreover, these algorithms produce even more compact and accurate models than batch regression tree algorithms because they use intervals or functionalmodels with a change detectionmechanism,which makes them amore suitable choice for regression analysis of data streams. Finally, this review summarizes the performance results of the reviewed methods and crystallizes 10 requirements for successful implementation of a regression tree algorithm in data stream mining area.

Original languageEnglish
Pages (from-to)69-78
Number of pages10
JournalWiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Volume2
Issue number1
DOIs
StatePublished - 1 Jan 2012

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Knowledge discovery in data streams with regression tree methods'. Together they form a unique fingerprint.

Cite this