TY - JOUR
T1 - Vertical ensemble co-training for text classification
AU - Katz, Gilad
AU - Caragea, Cornelia
AU - Shabtai, Asaf
N1 - Publisher Copyright:
© 2017 ACM.
PY - 2017/10/1
Y1 - 2017/10/1
N2 - High-quality, labeled data is essential for successfully applying machine learning methods to real-world text classification problems. However, in many cases, the amount of labeled data is very small compared to that of the unlabeled, and labeling additional samples could be expensive and time consuming. Co-training algorithms, which make use of unlabeled data to improve classification, have proven to be very effective in such cases. Generally, co-training algorithms work by using two classifiers, trained on two different views of the data, to label large amounts of unlabeled data. Doing so can help minimize the human effort required for labeling new data, as well as improve classification performance. In this article, we propose an ensemble-based co-training approach that uses an ensemble of classifiers from different training iterations to improve labeling accuracy. This approach, which we call vertical ensemble, incurs almost no additional computational cost. Experiments conducted on six textual datasets show a significant improvement of over 45% in AUC compared with the original co-training algorithm.
AB - High-quality, labeled data is essential for successfully applying machine learning methods to real-world text classification problems. However, in many cases, the amount of labeled data is very small compared to that of the unlabeled, and labeling additional samples could be expensive and time consuming. Co-training algorithms, which make use of unlabeled data to improve classification, have proven to be very effective in such cases. Generally, co-training algorithms work by using two classifiers, trained on two different views of the data, to label large amounts of unlabeled data. Doing so can help minimize the human effort required for labeling new data, as well as improve classification performance. In this article, we propose an ensemble-based co-training approach that uses an ensemble of classifiers from different training iterations to improve labeling accuracy. This approach, which we call vertical ensemble, incurs almost no additional computational cost. Experiments conducted on six textual datasets show a significant improvement of over 45% in AUC compared with the original co-training algorithm.
KW - Co-training
KW - Ensemble
KW - Text classification
UR - http://www.scopus.com/inward/record.url?scp=85033241806&partnerID=8YFLogxK
U2 - 10.1145/3137114
DO - 10.1145/3137114
M3 - Article
AN - SCOPUS:85033241806
VL - 9
JO - ACM Transactions on Intelligent Systems and Technology
JF - ACM Transactions on Intelligent Systems and Technology
SN - 2157-6904
IS - 2
M1 - 21
ER -