Continual Optimistic Initialization for Value-Based Reinforcement Learning

Sheelabhadra Dey, James Ault, Guni Sharon

Research output: Contribution to journalConference articlepeer-review

Abstract

Comprehensive state-action exploration is essential for reinforcement learning (RL) algorithms. It enables them to find optimal solutions and avoid premature convergence. In value-based RL, optimistic initialization of the value function ensures sufficient exploration for finding the optimal solution. Optimistic values lead to curiosity-driven exploration enabling visitation of under-explored regions. However, optimistic initialization has limitations in stochastic and non-stationary environments due to its inability to explore “infinitely-often”. To address this limitation, we propose a novel exploration strategy for value-based RL, denoted COIN, based on recurring optimistic initialization. By injecting a continual exploration bonus, we overcome the shortcoming of optimistic initialization (sensitivity to environment noise). We provide a rigorous theoretical comparison of COIN versus existing popular exploration strategies and prove it provides a unique set of attributes (coverage, infinite-often, no visitation tracking, and curiosity). We demonstrate the superiority of COIN over popular existing strategies on a designed toy domain as well as present results on common benchmark tasks. We observe that COIN outperforms existing exploration strategies in four out of six benchmark tasks while performing on par with the best baseline on the other two tasks.

Original languageEnglish
Pages (from-to)453-462
Number of pages10
JournalProceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
Volume2024-May
StatePublished - 1 Jan 2024
Externally publishedYes
Event23rd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2024 - Auckland, New Zealand
Duration: 6 May 202410 May 2024

Keywords

  • Exploration Strategies
  • Optimistic Initialization
  • Reinforcement Learning

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering

Fingerprint

Dive into the research topics of 'Continual Optimistic Initialization for Value-Based Reinforcement Learning'. Together they form a unique fingerprint.

Cite this