TY - GEN
T1 - Goal Density-based Hindsight Experience Prioritization for Multi-Goal Robot Manipulation Reinforcement Learning
AU - Kuang, Yingyi
AU - Weinberg, Abraham Itzhak
AU - Vogiatzis, George
AU - Faria, Diego R.
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/8/1
Y1 - 2020/8/1
N2 - Reinforcement learning for multi-goal robot manipulation tasks is usually challenging, especially when sparse rewards are provided. It often requires millions of data collected before a stable strategy is learned. Recent algorithms like Hindsight Experience Replay (HER) have accelerated the learning process greatly by replacing the original desired goal with one of the achieved points (substitute goals) alongside the same trajectory. However, the selection of previous experience to learn is naively sampled in HER, in which the trajectory selection and the substitute goal sampling is completely random. In this paper, we discuss an experience prioritization strategy for HER that improves the learning efficiency. We propose the Goal Density-based hindsight experience Prioritization (GDP) method that focuses on utilizing the density distribution of the achieved points and prioritizes achieved points which are rarely seen in the replay buffer. These points are used as substitute goals for HER. In addition, we propose an Prioritization Switching with Ensembling Strategy (PSES) method to switch different experience prioritization algorithms during learning, which allows to select the best performance during each learning stage. We evaluate our method with several OpenAI Gym robotic manipulation tasks. The results show that GDP accelerates the learning process in most tasks and can be improved when combining with other prioritization methods using PSES.
AB - Reinforcement learning for multi-goal robot manipulation tasks is usually challenging, especially when sparse rewards are provided. It often requires millions of data collected before a stable strategy is learned. Recent algorithms like Hindsight Experience Replay (HER) have accelerated the learning process greatly by replacing the original desired goal with one of the achieved points (substitute goals) alongside the same trajectory. However, the selection of previous experience to learn is naively sampled in HER, in which the trajectory selection and the substitute goal sampling is completely random. In this paper, we discuss an experience prioritization strategy for HER that improves the learning efficiency. We propose the Goal Density-based hindsight experience Prioritization (GDP) method that focuses on utilizing the density distribution of the achieved points and prioritizes achieved points which are rarely seen in the replay buffer. These points are used as substitute goals for HER. In addition, we propose an Prioritization Switching with Ensembling Strategy (PSES) method to switch different experience prioritization algorithms during learning, which allows to select the best performance during each learning stage. We evaluate our method with several OpenAI Gym robotic manipulation tasks. The results show that GDP accelerates the learning process in most tasks and can be improved when combining with other prioritization methods using PSES.
UR - http://www.scopus.com/inward/record.url?scp=85095793124&partnerID=8YFLogxK
U2 - 10.1109/RO-MAN47096.2020.9223473
DO - 10.1109/RO-MAN47096.2020.9223473
M3 - Conference contribution
AN - SCOPUS:85095793124
T3 - 29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020
SP - 432
EP - 437
BT - 29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020
PB - Institute of Electrical and Electronics Engineers
T2 - 29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020
Y2 - 31 August 2020 through 4 September 2020
ER -