TY - GEN
T1 - ExploreKit
T2 - 16th IEEE International Conference on Data Mining, ICDM 2016
AU - Katz, Gilad
AU - Shin, Eui Chul Richard
AU - Song, Dawn
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/7/2
Y1 - 2016/7/2
N2 - Feature generation is one of the challenging aspects of machine learning. We present ExploreKit, a framework for automated feature generation. ExploreKit generates a large set of candidate features by combining information in the original features, with the aim of maximizing predictive performance according to user-selected criteria. To overcome the exponential growth of the feature space, ExploreKit uses a novel machine learning-based feature selection approach to predict the usefulness of new candidate features. This approach enables efficient identification of the new features and produces superior results compared to existing feature selection solutions. We demonstrate the effectiveness and robustness of our approach by conducting an extensive evaluation on 25 datasets and 3 different classification algorithms. We show that ExploreKit can achieve classification-error reduction of 20% overall. Our code is available at https://github.com/giladkatz/ExploreKit.
AB - Feature generation is one of the challenging aspects of machine learning. We present ExploreKit, a framework for automated feature generation. ExploreKit generates a large set of candidate features by combining information in the original features, with the aim of maximizing predictive performance according to user-selected criteria. To overcome the exponential growth of the feature space, ExploreKit uses a novel machine learning-based feature selection approach to predict the usefulness of new candidate features. This approach enables efficient identification of the new features and produces superior results compared to existing feature selection solutions. We demonstrate the effectiveness and robustness of our approach by conducting an extensive evaluation on 25 datasets and 3 different classification algorithms. We show that ExploreKit can achieve classification-error reduction of 20% overall. Our code is available at https://github.com/giladkatz/ExploreKit.
UR - http://www.scopus.com/inward/record.url?scp=85014544603&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2016.176
DO - 10.1109/ICDM.2016.176
M3 - Conference contribution
AN - SCOPUS:85014544603
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 979
EP - 984
BT - Proceedings - 16th IEEE International Conference on Data Mining, ICDM 2016
A2 - Bonchi, Francesco
A2 - Domingo-Ferrer, Josep
A2 - Baeza-Yates, Ricardo
A2 - Zhou, Zhi-Hua
A2 - Wu, Xindong
PB - Institute of Electrical and Electronics Engineers
Y2 - 12 December 2016 through 15 December 2016
ER -