Epsilon Best-Arm Identification in Pay-Per-Reward Multi-Armed Bandits

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We study epsilon-best-arm identification, in a setting where during the exploration phase, the cost of each arm pull is proportional to the expected future reward of that arm. We term this setting Pay-Per-Reward. We provide an algorithm for this setting, that with a high probability returns an ?-best arm, while incurring a cost that depends only linearly on the total expected reward of all arms, and does not depend at all on the number of arms. Under mild assumptions, the algorithm can be applied also to problems with infinitely many arms.

Original languageEnglish GB
Title of host publication33rd Conference on Neural Information Processing Systems (NeurIPS 2019)
Pages2876-2886
Volume32
StatePublished - 2019
Event33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019 - Vancouver, Canada
Duration: 8 Dec 201914 Dec 2019

Conference

Conference33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019
Country/TerritoryCanada
CityVancouver
Period8/12/1914/12/19

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Epsilon Best-Arm Identification in Pay-Per-Reward Multi-Armed Bandits'. Together they form a unique fingerprint.

Cite this