Abstract
This paper presents an advanced review of regression tree methods for mining data streams. Batch regression tree methods are known for their simplicity, interpretability, accuracy, and efficiency. They use fast divide-and-conquer greedy algorithms that recursively partition the given training data into smaller subsets. The result is a tree-shaped model with splitting rules in the internal nodes and predictions in the leaves. Most batch regression tree methods take a complete dataset and build a model using that data. Generally, this tree model cannot be modified if new data is acquired later. Their successors, the incremental model and interval trees algorithms, are able to build and retrain a model on a step-bystep basis by incorporating new numerical training instances into the model as they become available. Moreover, these algorithms produce even more compact and accurate models than batch regression tree algorithms because they use intervals or functionalmodels with a change detectionmechanism,which makes them amore suitable choice for regression analysis of data streams. Finally, this review summarizes the performance results of the reviewed methods and crystallizes 10 requirements for successful implementation of a regression tree algorithm in data stream mining area.
Original language | English |
---|---|
Pages (from-to) | 69-78 |
Number of pages | 10 |
Journal | Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery |
Volume | 2 |
Issue number | 1 |
DOIs | |
State | Published - 1 Jan 2012 |
ASJC Scopus subject areas
- General Computer Science