Abstract
Comprehensive state-action exploration is essential for reinforcement learning (RL) algorithms. It enables them to find optimal solutions and avoid premature convergence. In value-based RL, optimistic initialization of the value function ensures sufficient exploration for finding the optimal solution. Optimistic values lead to curiosity-driven exploration enabling visitation of under-explored regions. However, optimistic initialization has limitations in stochastic and non-stationary environments due to its inability to explore “infinitely-often”. To address this limitation, we propose a novel exploration strategy for value-based RL, denoted COIN, based on recurring optimistic initialization. By injecting a continual exploration bonus, we overcome the shortcoming of optimistic initialization (sensitivity to environment noise). We provide a rigorous theoretical comparison of COIN versus existing popular exploration strategies and prove it provides a unique set of attributes (coverage, infinite-often, no visitation tracking, and curiosity). We demonstrate the superiority of COIN over popular existing strategies on a designed toy domain as well as present results on common benchmark tasks. We observe that COIN outperforms existing exploration strategies in four out of six benchmark tasks while performing on par with the best baseline on the other two tasks.
Original language | English |
---|---|
Pages (from-to) | 453-462 |
Number of pages | 10 |
Journal | Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS |
Volume | 2024-May |
State | Published - 1 Jan 2024 |
Externally published | Yes |
Event | 23rd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2024 - Auckland, New Zealand Duration: 6 May 2024 → 10 May 2024 |
Keywords
- Exploration Strategies
- Optimistic Initialization
- Reinforcement Learning
ASJC Scopus subject areas
- Artificial Intelligence
- Software
- Control and Systems Engineering