Nearly optimal classifcation for semimetrics

Lee Ad Gottlieb, Aryeh Kontorovich, Pinhas Nisnevitch

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

We initiate the rigorous study of classification in semimetric spaces, which are point sets with a distance function that is non-negative and symmetric, but need not satisfy the triangle inequality. We define the density dimension dens and discover that it plays a central role in the statistical and algorithmic feasibility of learning in semimetric spaces. We compute this quantity for several widely used semimetrics and present nearly optimal sample compression algorithms, which are then used to obtain generalization guarantees, including fast rates. Our claim of near-optimality holds in both computational and statistical senses. When the sample has radius R and margin , we show that it can be compressed down to roughly d = (R/γ)dens points, and further that finding a significantly better compression is algorithmically intractable unless P=NP. This compression implies generalization via standard Occam-Type arguments, to which we provide a nearly matching lower bound.

Original languageEnglish
Pages (from-to)1-22
JournalJournal of Machine Learning Research
Volume18
Issue number3
StatePublished - 1 Apr 2017

Keywords

  • Classification
  • Compression
  • Generalization
  • Semimetric

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Nearly optimal classifcation for semimetrics'. Together they form a unique fingerprint.

Cite this