ModelDiff: Symbolic Dynamic Programming for Model-Aware Policy Transfer in Deep Q-Learning

Xiaotian Liu, Jihwan Jeong, Ayal Taitler, Michael Gimelfarb, Scott Sanner

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Despite significant recent advances in the field of Deep Reinforcement Learning (DRL), such methods typically incur high cost of training to learn effective policies, thus posing cost and safety challenges in many practical applications. To improve the learning efficiency of (D) RL methods, transfer learning (TL) has emerged as a promising approach to leverage prior experience on a source domain to speed learning on a new, but related, target domain. In this paper, we take a novel model-informed approach to TL in DRL by assuming that we have knowledge of both the source and target domain models (which would be the case in the prevalent setting of DRL with simulators). While directly solving either the source or target MDP via solution methods like value iteration is computationally prohibitive, we exploit the fact that if the target and source MDPs differ only due to a small structural change in their rewards, we can apply structured value iteration methods in a procedure we term ModelDiff to solve the much smaller target-source``Diff''MDP for a reasonable horizon. This ModelDiff approach can then be integrated into extensions of standard DRL algorithms like ModelDiff (MD) DQN, where it provides enhanced provable lower bound guidance to DQN that often speeds convergence for the positive transfer case while critically avoiding decelerated learning in the negative transfer case. Experiments show that MD-DQN matches or outperforms existing TL methods and baselines in both positive and negative transfer settings.
Original languageEnglish
Title of host publicationProceedings of the AAAI Conference on Artificial Intelligence
Place of PublicationPhiladelphia
PublisherAssociation for the Advancement of Artificial Intelligence
Pages26623-26630
Volume39
Edition25
DOIs
StatePublished - 28 Feb 2025

Fingerprint

Dive into the research topics of 'ModelDiff: Symbolic Dynamic Programming for Model-Aware Policy Transfer in Deep Q-Learning'. Together they form a unique fingerprint.

Cite this