Feedback Decision Transformer: Offline Reinforcement Learning With Feedback

Liad Giladi, Gilad Katz

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recent trajectory optimization methods for offline reinforcement learning (R L) define the problem as one of conditional-sequence policy modeling. One of these methods is Decision Transformer (DT), a Transformer-based trajectory optimization approach that achieved competitive results with the current state-of-the-art. Despite its high capabilities, DT underperforms when the training data does not contain full trajectories, or when the recorded behavior does not offer sufficient coverage of the states-actions space. We propose Feedback Decision Transformer (FDT), a data-driven approach that uses limited amounts of high-quality feedback at critical states to significantly improve DT's performance. Our approach analyzes and estimates the Q-function across the states-actions space, and identifies areas where feedback is likely to be most impactful. Next, we integrate this feedback into our model, and use it to improve our model's performance. Extensive evaluation and analysis on four Atari games show that FDT significantly outperforms DT in multiple setups and configurations.

Original languageEnglish
Title of host publicationProceedings - 23rd IEEE International Conference on Data Mining, ICDM 2023
EditorsGuihai Chen, Latifur Khan, Xiaofeng Gao, Meikang Qiu, Witold Pedrycz, Xindong Wu
PublisherInstitute of Electrical and Electronics Engineers
Pages1037-1042
Number of pages6
ISBN (Electronic)9798350307887
DOIs
StatePublished - 1 Jan 2023
Event23rd IEEE International Conference on Data Mining, ICDM 2023 - Shanghai, China
Duration: 1 Dec 20234 Dec 2023

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Conference

Conference23rd IEEE International Conference on Data Mining, ICDM 2023
Country/TerritoryChina
CityShanghai
Period1/12/234/12/23

Keywords

  • Deep Reinforcement Learning
  • Offline Reinforcement Learning
  • RL with Human Feedback.

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Feedback Decision Transformer: Offline Reinforcement Learning With Feedback'. Together they form a unique fingerprint.

Cite this