?-best-arm identification in pay-per-reward multi-armed bandits

Research output: Contribution to journalConference articlepeer-review

2 Scopus citations


We study ?-best-arm identification, in a setting where during the exploration phase, the cost of each arm pull is proportional to the expected future reward of that arm. We term this setting Pay-Per-Reward. We provide an algorithm for this setting, that with a high probability returns an ?-best arm, while incurring a cost that depends only linearly on the total expected reward of all arms, and does not depend at all on the number of arms. Under mild assumptions, the algorithm can be applied also to problems with infinitely many arms.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
StatePublished - 1 Jan 2019
Event33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019 - Vancouver, Canada
Duration: 8 Dec 201914 Dec 2019

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing


Dive into the research topics of '?-best-arm identification in pay-per-reward multi-armed bandits'. Together they form a unique fingerprint.

Cite this