Abstract
This work addresses the challenge of reinforcement learning with reward functions that feature highly imbalanced components in terms of importance and scale. Reinforcement learning algorithms generally struggle to handle such imbalanced reward functions effectively. Consequently, they often converge to suboptimal policies that favor only the dominant reward component. For example, agents might adopt passive strategies, avoiding any action to evade potentially unsafe outcomes entirely. To mitigate the adverse effects of imbalanced reward functions, we introduce a curriculum learning approach based on the successor features representation. This novel approach enables our learning system to acquire policies that take into account all reward components, allowing for a more balanced and versatile decision-making process.
Original language | English |
---|---|
Pages (from-to) | 5174-5181 |
Number of pages | 8 |
Journal | IEEE Robotics and Automation Letters |
Volume | 9 |
Issue number | 6 |
DOIs | |
State | Published - 1 Jun 2024 |
Keywords
- Reinforcement learning
- continual learning
ASJC Scopus subject areas
- Control and Systems Engineering
- Biomedical Engineering
- Human-Computer Interaction
- Mechanical Engineering
- Computer Vision and Pattern Recognition
- Computer Science Applications
- Control and Optimization
- Artificial Intelligence