Abstract
We study ?-best-arm identification, in a setting where during the exploration phase, the cost of each arm pull is proportional to the expected future reward of that arm. We term this setting Pay-Per-Reward. We provide an algorithm for this setting, that with a high probability returns an ?-best arm, while incurring a cost that depends only linearly on the total expected reward of all arms, and does not depend at all on the number of arms. Under mild assumptions, the algorithm can be applied also to problems with infinitely many arms.
Original language | English |
---|---|
Journal | Advances in Neural Information Processing Systems |
Volume | 32 |
State | Published - 1 Jan 2019 |
Event | 33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019 - Vancouver, Canada Duration: 8 Dec 2019 → 14 Dec 2019 |
ASJC Scopus subject areas
- Computer Networks and Communications
- Information Systems
- Signal Processing