Automatic Selection of Clustering Algorithms Using Supervised Graph Embedding

Noy Cohen-Shapira, Lior Rokach

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

The widespread adoption of machine learning (ML) techniques and the extensive expertise required to apply them have led to increased interest in automated ML solutions that reduce the need for human intervention. One of the main challenges in applying ML to previously unseen problems is algorithm selection – the identification of high-performing algorithm(s) for a given dataset, task, and evaluation measure. This study addresses the algorithm selection challenge for data clustering, a fundamental task in data mining that is aimed at grouping similar objects. We present MARCO-GE, a novel meta-learning approach for the automated recommendation of clustering algorithms. MARCO-GE first transforms datasets into graphs and then utilizes a graph convolutional neural network technique to extract their latent representation. Using the embedding representations obtained, MARCO-GE trains a ranking meta-model capable of accurately recommending top-performing algorithms for a new dataset and clustering evaluation measure. An extensive evaluation on 210 datasets, 17 clustering algorithms, and 10 clustering measures demonstrates the effectiveness of our approach and its superiority in terms of predictive and generalization performance over state-of-the-art clustering meta-learning approaches.

Original languageEnglish
Pages (from-to)824-851
Number of pages28
JournalInformation Sciences
Volume577
DOIs
StatePublished - 1 Oct 2021

Keywords

  • Algorithm ranking
  • Algorithm selection
  • AutoML
  • Clustering
  • Meta-learning

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Automatic Selection of Clustering Algorithms Using Supervised Graph Embedding'. Together they form a unique fingerprint.

Cite this