Abstract
Explainable reinforcement learning methods aim to help elucidate agent policies and their underlying decision-making processes. One such method is reward decomposition, which aims to reveal an agent's preferences in a specific world-state by presenting its expected utility decomposed to different components of the reward function. While this approach quantifies the expected decomposed rewards for alternative actions, it does not demonstrate the outcomes of these alternative actions in terms of the behavior of the agent. This work introduces “Contrastive Highlights”, a novel local explanation method that visually compares the agent's chosen behavior to an alternative choice of action in a contrastive manner. We conducted user studies comparing participants' understanding of agents' preferences based on either reward decomposition, contrastive highlights, or a combination of both approaches. Our results show that integrating reward decomposition with contrastive highlights significantly improved participants' performance compared to using each of the approaches separately.
Original language | English |
---|---|
Pages (from-to) | 2295-2297 |
Number of pages | 3 |
Journal | Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS |
Volume | 2023-May |
State | Published - 1 Jan 2023 |
Externally published | Yes |
Event | 22nd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2023 - London, United Kingdom Duration: 29 May 2023 → 2 Jun 2023 |
Keywords
- Deep Reinforcement Learning
- Explainable AI
- Human-AI Interaction
ASJC Scopus subject areas
- Artificial Intelligence
- Software
- Control and Systems Engineering