Abstract
Feature engineering is one of the major challenges of machine learning. While multiple automation solutions have been proposed in recent years, the vast majority focuses on extracting features from the analyzed dataset itself and not from other (external) sources. In this study we present FGSES, a general framework for automatic feature engineering and its application to DBpedia. Our framework automatically matches the entities in the analyzed dataset to those of the external data source, and then proceeds to generate a large and diverse set of candidate features, both from structured and unstructured content. To efficiently process the large number of generated features, FGSES uses a meta learning-based ranking approach. Our evaluation, conducted on 18 tabular datasets with diverse characteristics, shows that FGSES achieves an average error reduction of 16.5%, significantly outperforming the evaluated baselines.
Original language | English |
---|---|
Pages (from-to) | 398-414 |
Number of pages | 17 |
Journal | Information Sciences |
Volume | 582 |
DOIs | |
State | Published - 1 Jan 2022 |
Keywords
- Feature generation
- Information discovery
- Meta learning
ASJC Scopus subject areas
- Software
- Control and Systems Engineering
- Theoretical Computer Science
- Computer Science Applications
- Information Systems and Management
- Artificial Intelligence