Isolation forests and landmarking-based representations for clustering algorithm recommendation using meta-learning

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

The data clustering problem can be described as the task of organizing data into groups, where in each group the objects share some similar attributes. Most of the problems clustering algorithms address do not have a prior solution. This paper addresses the algorithm selection challenge for data clustering, while taking the difficulty in evaluating clustering solutions into account. We present a new meta-learning method for recommending the most suitable clustering algorithm for a dataset. Based on concepts from the isolation forest algorithm, we propose a new similarity measure between datasets. Our proposed dataset characterization methods generate an embedding for a dataset using this similarity measure, which is then used to improve the quality of the problem's characterization. The method utilizes landmarking concepts to characterize the dataset and then, inspired by the DeepFM algorithm, applies meta-learning to rank the candidate algorithms that are expected to perform the best for the current dataset. This ranking could, among other things, support AutoML systems. Our approach is evaluated on a corpus of 100 publicly available benchmark datasets. We compare our method's ranking performance to that of existing meta-learning methods and show the dominance of our method in terms of predictive performance and computational complexity.

Original languageEnglish
Pages (from-to)473-489
Number of pages17
JournalInformation Sciences
Volume574
DOIs
StatePublished - 1 Oct 2021

Keywords

  • Algorithm selection
  • Clustering
  • Dataset embedding
  • Meta-knowledge
  • Meta-learning systems
  • Problem characterization

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Isolation forests and landmarking-based representations for clustering algorithm recommendation using meta-learning'. Together they form a unique fingerprint.

Cite this