TY - GEN
T1 - Feedback Decision Transformer
T2 - 23rd IEEE International Conference on Data Mining, ICDM 2023
AU - Giladi, Liad
AU - Katz, Gilad
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023/1/1
Y1 - 2023/1/1
N2 - Recent trajectory optimization methods for offline reinforcement learning (R L) define the problem as one of conditional-sequence policy modeling. One of these methods is Decision Transformer (DT), a Transformer-based trajectory optimization approach that achieved competitive results with the current state-of-the-art. Despite its high capabilities, DT underperforms when the training data does not contain full trajectories, or when the recorded behavior does not offer sufficient coverage of the states-actions space. We propose Feedback Decision Transformer (FDT), a data-driven approach that uses limited amounts of high-quality feedback at critical states to significantly improve DT's performance. Our approach analyzes and estimates the Q-function across the states-actions space, and identifies areas where feedback is likely to be most impactful. Next, we integrate this feedback into our model, and use it to improve our model's performance. Extensive evaluation and analysis on four Atari games show that FDT significantly outperforms DT in multiple setups and configurations.
AB - Recent trajectory optimization methods for offline reinforcement learning (R L) define the problem as one of conditional-sequence policy modeling. One of these methods is Decision Transformer (DT), a Transformer-based trajectory optimization approach that achieved competitive results with the current state-of-the-art. Despite its high capabilities, DT underperforms when the training data does not contain full trajectories, or when the recorded behavior does not offer sufficient coverage of the states-actions space. We propose Feedback Decision Transformer (FDT), a data-driven approach that uses limited amounts of high-quality feedback at critical states to significantly improve DT's performance. Our approach analyzes and estimates the Q-function across the states-actions space, and identifies areas where feedback is likely to be most impactful. Next, we integrate this feedback into our model, and use it to improve our model's performance. Extensive evaluation and analysis on four Atari games show that FDT significantly outperforms DT in multiple setups and configurations.
KW - Deep Reinforcement Learning
KW - Offline Reinforcement Learning
KW - RL with Human Feedback.
UR - http://www.scopus.com/inward/record.url?scp=85185391713&partnerID=8YFLogxK
U2 - 10.1109/ICDM58522.2023.00120
DO - 10.1109/ICDM58522.2023.00120
M3 - Conference contribution
AN - SCOPUS:85185391713
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 1037
EP - 1042
BT - Proceedings - 23rd IEEE International Conference on Data Mining, ICDM 2023
A2 - Chen, Guihai
A2 - Khan, Latifur
A2 - Gao, Xiaofeng
A2 - Qiu, Meikang
A2 - Pedrycz, Witold
A2 - Wu, Xindong
PB - Institute of Electrical and Electronics Engineers
Y2 - 1 December 2023 through 4 December 2023
ER -