Learning dataset representation for automatic machine learning algorithm selection

Noy Cohen-Shapira, Lior Rokach

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

The algorithm selection problem is defined as identifying the best-performing machine learning (ML) algorithm for a given combination of dataset, task, and evaluation measure. The human expertise required to evaluate the increasing number of ML algorithms available has resulted in the need to automate the algorithm selection task. Various approaches have emerged to handle the automatic algorithm selection challenge, including meta-learning. Meta-learning is a popular approach that leverages accumulated experience for future learning and typically involves dataset characterization. Existing meta-learning methods often represent a dataset using predefined features and thus cannot be generalized across different ML tasks, or alternatively, learn a dataset’s representation in a supervised manner and therefore are unable to deal with unsupervised tasks. In this study, we propose a novel learning-based task-agnostic method for producing dataset representations. Then, we introduce TRIO, a meta-learning approach, that utilizes the proposed dataset representations to accurately recommend top-performing algorithms for previously unseen datasets. TRIO first learns graphical representations for the datasets, using four tools to learn the latent interactions among dataset instances and then utilizes a graph convolutional neural network technique to extract embedding representations from the graphs obtained. We extensively evaluate the effectiveness of our approach on 337 datasets and 195 ML algorithms, demonstrating that TRIO significantly outperforms state-of-the-art methods for algorithm selection for both supervised (classification and regression) and unsupervised (clustering) tasks.

Original languageEnglish
Pages (from-to)2599-2635
Number of pages37
JournalKnowledge and Information Systems
Volume64
Issue number10
DOIs
StatePublished - 4 Aug 2022

Keywords

  • Algorithm selection
  • AutoML
  • Meta-learning
  • Task-agnostic dataset representation

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Human-Computer Interaction
  • Hardware and Architecture
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Learning dataset representation for automatic machine learning algorithm selection'. Together they form a unique fingerprint.

Cite this