Improving Supervised Learning by Sample Decomposition.

Lior Rokach, Oded Maimon, Omri Arad

Research output: Contribution to journalArticlepeer-review

Abstract

This paper introduces a new ensemble technique, cluster-based concurrent decomposition (CBCD) that induces an ensemble of classifiers by decomposing the training set into mutually exclusive sub-samples of equal-size. The CBCD algorithm first clusters the instance space by using the K-means clustering algorithm. Afterwards it produces disjoint sub-samples using the clusters in such a way that each sub-sample is comprised of tuples from all clusters and hence represents the entire dataset. An induction algorithm is applied in turn to each subset, followed by a voting mechanism that combines the classifier's predictions. The CBCD algorithm has two tuning parameters: the number of clusters and the number of subsets to create. Using a suitable meta-learning it is possible to tune these parameters properly. In the experimental study we conducted, the CBCD algorithm, using an embedded C4.5 algorithm, outperformed the bagging algorithm of the same computational complexity.
Original languageEnglish GB
Article number1
Pages (from-to)37-54
Number of pages18
JournalInternational Journal of Computational Intelligence and Applications
Volume5
Issue number1
DOIs
StatePublished - 2005
Externally publishedYes

Fingerprint

Dive into the research topics of 'Improving Supervised Learning by Sample Decomposition.'. Together they form a unique fingerprint.

Cite this