ExploreKit: Automatic feature generation and selection

Gilad Katz, Eui Chul Richard Shin, Dawn Song

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

57 Scopus citations

Abstract

Feature generation is one of the challenging aspects of machine learning. We present ExploreKit, a framework for automated feature generation. ExploreKit generates a large set of candidate features by combining information in the original features, with the aim of maximizing predictive performance according to user-selected criteria. To overcome the exponential growth of the feature space, ExploreKit uses a novel machine learning-based feature selection approach to predict the usefulness of new candidate features. This approach enables efficient identification of the new features and produces superior results compared to existing feature selection solutions. We demonstrate the effectiveness and robustness of our approach by conducting an extensive evaluation on 25 datasets and 3 different classification algorithms. We show that ExploreKit can achieve classification-error reduction of 20% overall. Our code is available at https://github.com/giladkatz/ExploreKit.

Original languageEnglish
Title of host publicationProceedings - 16th IEEE International Conference on Data Mining, ICDM 2016
EditorsFrancesco Bonchi, Xindong Wu, Ricardo Baeza-Yates, Josep Domingo-Ferrer, Zhi-Hua Zhou
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages979-984
Number of pages6
ISBN (Electronic)9781509054725
DOIs
StatePublished - 31 Jan 2017
Externally publishedYes
Event16th IEEE International Conference on Data Mining, ICDM 2016 - Barcelona, Catalonia, Spain
Duration: 12 Dec 201615 Dec 2016

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Conference

Conference16th IEEE International Conference on Data Mining, ICDM 2016
Country/TerritorySpain
CityBarcelona, Catalonia
Period12/12/1615/12/16

Fingerprint

Dive into the research topics of 'ExploreKit: Automatic feature generation and selection'. Together they form a unique fingerprint.

Cite this