Efficient Feature Ranking and Selection Using Statistical Moments

Yael Hochma, Yuval Felendler, Mark Last

Research output: Contribution to journalArticlepeer-review

Abstract

Unsupervised feature selection methods can be more efficient than supervised methods, which rely on the expensive and time-consuming data labeling process. The paper introduced skewness as a novel, unsupervised, and computationally efficient feature ranking metric, suitable for both classification and regression tasks. Its feature selection effectiveness is compared to several state-of-the-art supervised and unsupervised feature ranking and selection methods. Both theoretical analysis and empirical evaluation on several popular classification and regression algorithms show that statistical moment-based feature selection algorithms are competitive in terms of accuracy and mean squared error (MSE) with the state-of-the-art supervised approaches for feature ranking and selection, including Fast Correlation Based Filter (FCBF), Minimum Redundancy Maximum Relevance (MRMR), and Mutual Information Maximization (MIM). We also present a mathematical proof based on some common assumptions, which explains the high effectiveness of statistical moments in the feature ranking procedure. Moreover, statistical moment-based feature selection is shown empirically to run faster, on average, than the supervised approaches and the unsupervised Laplacian Score method. Additionally, skewness-based feature selection, in contrast to variance-based selection, does not depend on data normalization that requires additional computational time and may affect the feature ranking results.

Original languageEnglish
Pages (from-to)105573-105587
Number of pages15
JournalIEEE Access
Volume12
DOIs
StatePublished - 1 Jan 2024

Keywords

  • Feature ranking
  • skewness
  • unsupervised feature selection
  • variance

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering

Fingerprint

Dive into the research topics of 'Efficient Feature Ranking and Selection Using Statistical Moments'. Together they form a unique fingerprint.

Cite this