Abstract
This paper introduces a new ensemble technique, cluster-based concurrent decomposition (CBCD) that induces an ensemble of classifiers by decomposing the training set into mutually exclusive sub-samples of equal-size. The CBCD algorithm first clusters the instance space by using the K-means clustering algorithm. Afterwards it produces disjoint sub-samples using the clusters in such a way that each sub-sample is comprised of tuples from all clusters and hence represents the entire dataset. An induction algorithm is applied in turn to each subset, followed by a voting mechanism that combines the classifier's predictions. The CBCD algorithm has two tuning parameters: the number of clusters and the number of subsets to create. Using a suitable meta-learning it is possible to tune these parameters properly. In the experimental study we conducted, the CBCD algorithm, using an embedded C4.5 algorithm, outperformed the bagging algorithm of the same computational complexity.
Original language | English GB |
---|---|
Article number | 1 |
Pages (from-to) | 37-54 |
Number of pages | 18 |
Journal | International Journal of Computational Intelligence and Applications |
Volume | 5 |
Issue number | 1 |
DOIs | |
State | Published - 2005 |
Externally published | Yes |