Automatic features generation and selection from external sources: A DBpedia use case

Asaf Harari, Gilad Katz

Research output: Contribution to journalArticlepeer-review

Abstract

Feature engineering is one of the major challenges of machine learning. While multiple automation solutions have been proposed in recent years, the vast majority focuses on extracting features from the analyzed dataset itself and not from other (external) sources. In this study we present FGSES, a general framework for automatic feature engineering and its application to DBpedia. Our framework automatically matches the entities in the analyzed dataset to those of the external data source, and then proceeds to generate a large and diverse set of candidate features, both from structured and unstructured content. To efficiently process the large number of generated features, FGSES uses a meta learning-based ranking approach. Our evaluation, conducted on 18 tabular datasets with diverse characteristics, shows that FGSES achieves an average error reduction of 16.5%, significantly outperforming the evaluated baselines.

Original languageEnglish
Pages (from-to)398-414
Number of pages17
JournalInformation Sciences
Volume582
DOIs
StatePublished - 1 Jan 2022

Keywords

  • Feature generation
  • Information discovery
  • Meta learning

Fingerprint

Dive into the research topics of 'Automatic features generation and selection from external sources: A DBpedia use case'. Together they form a unique fingerprint.

Cite this