Near-optimal learning with average Hölder smoothness

Steve Hanneke, Aryeh Kontorovich, Guy Kornowski

Research output: Contribution to journalConference articlepeer-review

Abstract

We generalize the notion of average Lipschitz smoothness proposed by Ashlagi et al. [2021] by extending it to Hölder smoothness. This measure of the “effective smoothness” of a function is sensitive to the underlying distribution and can be dramatically smaller than its classic “worst-case” Hölder constant. We consider both the realizable and the agnostic (noisy) regression settings, proving upper and lower risk bounds in terms of the average Hölder smoothness; these rates improve upon both previously known rates even in the special case of average Lipschitz smoothness. Moreover, our lower bound is tight in the realizable setting up to log factors, thus we establish the minimax rate. From an algorithmic perspective, since our notion of average smoothness is defined with respect to the unknown underlying distribution, the learner does not have an explicit representation of the function class, hence is unable to execute ERM. Nevertheless, we provide distinct learning algorithms that achieve both (nearly) optimal learning rates. Our results hold in any totally bounded metric space, and are stated in terms of its intrinsic geometry. Overall, our results show that the classic worst-case notion of Hölder smoothness can be essentially replaced by its average, yielding considerably sharper guarantees.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
Volume36
StatePublished - 1 Jan 2023
Event37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, United States
Duration: 10 Dec 202316 Dec 2023

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Near-optimal learning with average Hölder smoothness'. Together they form a unique fingerprint.

Cite this