Heavy-tailed regression with a generalized median-of-means

Daniel Hsu, Sivan Sabato

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

This work proposes a simple and computationally efficient estimator for linear regression, and other smooth and strongly convex loss minimization problems. We prove loss approximation guarantees that hold for general distributions, including those with heavy tails. All prior results only hold for estimators which either assume bounded or subgaussian distributions, require prior knowledge of distributional properties, or are not known to be computationally tractable. In the special case of linear regression with possibly heavy-tailed responses and with bounded and well-conditioned covariates in d-dimensions, we show that a random sample of size O(dlog(1/δ)) suffices to obtain a constant factor approximation to the optimal loss with probability 1 - δ, a minimax optimal sample complexity up to log factors. The core technique used in the proposed estimator is a new generalization of the median-of-means estimator to arbitrary metric spaces.

Original languageEnglish
Title of host publication31st International Conference on Machine Learning, ICML 2014
PublisherInternational Machine Learning Society (IMLS)
Pages1196-1204
Number of pages9
ISBN (Electronic)9781634393973
StatePublished - 1 Jan 2014
Externally publishedYes
Event31st International Conference on Machine Learning, ICML 2014 - Beijing, China
Duration: 21 Jun 201426 Jun 2014

Conference

Conference31st International Conference on Machine Learning, ICML 2014
Country/TerritoryChina
CityBeijing
Period21/06/1426/06/14

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'Heavy-tailed regression with a generalized median-of-means'. Together they form a unique fingerprint.

Cite this