Relaxed Exploration Constrained Reinforcement Learning

Shahaf S. Shperberg, Bo Liu, Peter Stone

Research output: Contribution to journalConference articlepeer-review

Abstract

This research introduces a novel setting for reinforcement learning with constraints, termed Relaxed Exploration Constrained Reinforcement Learning (RECRL). Similar to standard constrained reinforcement learning (CRL), the objective in RECRL is to discover a policy that maximizes the environmental return while adhering to a predefined set of constraints. However, in some real-world settings, it is possible to train the agent in a setting that does not require strict adherence to the constraints, as long as the agent adheres to them once deployed. To model such settings, we introduce RECRL, which explicitly incorporates an initial training phase where the constraints are relaxed, enabling the agent to explore the environment more freely. Subsequently, during deployment, the agent is obligated to fully satisfy all constraints. To address RECRL problems, we introduce a curriculum-based approach called CLiC, designed to enhance the exploration of existing CRL algorithms during the training phase and facilitate convergence towards a policy that satisfies the full set of constraints by the end of training. Empirical evaluations demonstrate that CLiC yields policies with significantly higher returns during deployment compared to training solely under the strict set of constraints. The code is available at https://github.com/Shperb/RECRL.

Original languageEnglish
Pages (from-to)1727-1735
Number of pages9
JournalProceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
Volume2024-May
StatePublished - 1 Jan 2024
Event23rd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2024 - Auckland, New Zealand
Duration: 6 May 202410 May 2024

Keywords

  • Constrained Reinforcement Learning
  • Curriculum Learning

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering

Fingerprint

Dive into the research topics of 'Relaxed Exploration Constrained Reinforcement Learning'. Together they form a unique fingerprint.

Cite this