Gram-Schmidt Methods for Unsupervised Feature Extraction and Selection

  • Bahram Yaghooti
  • , Netanel Raviv
  • , Bruno Sinopoli

Research output: Contribution to journalArticlepeer-review

Abstract

Feature extraction and selection in the presence of nonlinear dependencies among the data is a fundamental challenge in unsupervised learning. We propose using a Gram-Schmidt (GS) type orthogonalization process over function spaces to detect and map out such dependencies. Specifically, by applying the GS process over some family of functions, we construct a series of covariance matrices that can either be used to identify new large-variance directions, or to remove those dependencies from known directions. In the former case, we provide information-theoretic guarantees in terms of entropy reduction. In the latter, we provide precise conditions by which the chosen function family eliminates existing redundancy in the data. Each approach provides both a feature extraction and a feature selection algorithm. Our feature extraction methods are linear, and can be seen as natural generalization of principal component analysis (PCA). We provide experimental results for synthetic and real-world benchmark datasets which show superior performance over state-of-the-art (linear) feature extraction and selection algorithms. Surprisingly, our linear feature extraction algorithms are comparable and often outperform several important nonlinear feature extraction methods such as autoencoders, kernel PCA, and UMAP. Furthermore, one of our feature selection algorithms strictly generalizes a recent Fourier-based feature selection mechanism (Heidari et al., IEEE Transactions on Information Theory, 2022), yet at significantly reduced complexity.

Original languageEnglish
Pages (from-to)7856-7885
Number of pages30
JournalIEEE Transactions on Information Theory
Volume71
Issue number10
DOIs
StatePublished - 1 Jan 2025
Externally publishedYes

Keywords

  • Feature extraction
  • Gram-Schmidt orthogonalization
  • feature selection
  • principal component analysis

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Gram-Schmidt Methods for Unsupervised Feature Extraction and Selection'. Together they form a unique fingerprint.

Cite this