Taming the noise in reinforcement learning via soft updates

Roy Fox, Ari Pakman, Naftali Tishby

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

100 Scopus citations

Abstract

Model-free reinforcement learning algorithms, such as Q-learning, perform poorly in the early stages of learning in noisy environments, because much effort is spent unlearning biased estimates of the state-action value function. The bias results from selecting, among several noisy estimates, the apparent optimum, which may actually be suboptimal. We propose G-learning, a new off-policy learning algorithm that regularizes the value estimates by penalizing deterministic policies in the beginning of the learning process. We show that this method reduces the bias of the value-function estimation, leading to faster convergence to the optimal value and the optimal policy. Moreover, G-learning enables the natural incorporation of prior domain knowledge, when available. The stochastic nature of G-learning also makes it avoid some exploration costs, a property usually attributed only to on-policy algorithms. We illustrate these ideas in several examples, where G-learning results in significant improvements of the convergence rate and the cost of the learning process.

Original languageEnglish
Title of host publication32nd Conference on Uncertainty in Artificial Intelligence 2016, UAI 2016
EditorsDominik Janzing, Alexander Ihler
PublisherAssociation For Uncertainty in Artificial Intelligence (AUAI)
Pages202-211
Number of pages10
ISBN (Electronic)9781510827806
StatePublished - 1 Jan 2016
Externally publishedYes
Event32nd Conference on Uncertainty in Artificial Intelligence 2016, UAI 2016 - Jersey City, United States
Duration: 25 Jun 201629 Jun 2016

Publication series

Name32nd Conference on Uncertainty in Artificial Intelligence 2016, UAI 2016

Conference

Conference32nd Conference on Uncertainty in Artificial Intelligence 2016, UAI 2016
Country/TerritoryUnited States
CityJersey City
Period25/06/1629/06/16

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Taming the noise in reinforcement learning via soft updates'. Together they form a unique fingerprint.

Cite this