TY - JOUR
T1 - Improving Supervised Learning by Sample Decomposition.
AU - Rokach, Lior
AU - Maimon, Oded
AU - Arad, Omri
N1 - DBLP License: DBLP's bibliographic metadata records provided through http://dblp.org/ are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.
PY - 2005
Y1 - 2005
N2 - This paper introduces a new ensemble technique, cluster-based concurrent decomposition (CBCD) that induces an ensemble of classifiers by decomposing the training set into mutually exclusive sub-samples of equal-size. The CBCD algorithm first clusters the instance space by using the K-means clustering algorithm. Afterwards it produces disjoint sub-samples using the clusters in such a way that each sub-sample is comprised of tuples from all clusters and hence represents the entire dataset. An induction algorithm is applied in turn to each subset, followed by a voting mechanism that combines the classifier's predictions. The CBCD algorithm has two tuning parameters: the number of clusters and the number of subsets to create. Using a suitable meta-learning it is possible to tune these parameters properly. In the experimental study we conducted, the CBCD algorithm, using an embedded C4.5 algorithm, outperformed the bagging algorithm of the same computational complexity.
AB - This paper introduces a new ensemble technique, cluster-based concurrent decomposition (CBCD) that induces an ensemble of classifiers by decomposing the training set into mutually exclusive sub-samples of equal-size. The CBCD algorithm first clusters the instance space by using the K-means clustering algorithm. Afterwards it produces disjoint sub-samples using the clusters in such a way that each sub-sample is comprised of tuples from all clusters and hence represents the entire dataset. An induction algorithm is applied in turn to each subset, followed by a voting mechanism that combines the classifier's predictions. The CBCD algorithm has two tuning parameters: the number of clusters and the number of subsets to create. Using a suitable meta-learning it is possible to tune these parameters properly. In the experimental study we conducted, the CBCD algorithm, using an embedded C4.5 algorithm, outperformed the bagging algorithm of the same computational complexity.
U2 - 10.1142/S146902680500143X
DO - 10.1142/S146902680500143X
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
VL - 5
SP - 37
EP - 54
JO - International Journal of Computational Intelligence and Applications
JF - International Journal of Computational Intelligence and Applications
SN - 1469-0268
IS - 1
M1 - 1
ER -