Skip to main navigation Skip to search Skip to main content

Confidence Backup Updates for Aggregating MDP State Values in Monte-Carlo Tree Search

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    Monte-Carlo Tree Search (MCTS) algorithms estimate the value of MDP states based on rewards received by performing multiple random simulations. MCTS algorithms can use different strategies to aggregate these rewards and provide an estimation for the states’ values. The most common aggregation method is to store the mean reward of all simulations. Another common approach stores the best observed reward from each state. Both of these methods have complementary benefits and drawbacks. In this paper, we show that both of these methods are biased estimators for the real expected value of MDP states. We propose an hybrid approach that uses the best reward for states with low noise, and otherwise uses the mean. Experimental results on the Sailing MDP domain show that our method has a considerable advantage when the rewards are drawn from a noisy distribution.

    Original languageEnglish
    Title of host publicationProceedings of the 8th Annual Symposium on Combinatorial Search, SoCS 2015
    EditorsLevi Lelis, Roni Stern
    PublisherAssociation for the Advancement of Artificial Intelligence
    Pages156-160
    Number of pages5
    ISBN (Electronic)9781577357322
    DOIs
    StatePublished - 1 Jan 2015
    Event8th Annual Symposium on Combinatorial Search, SoCS 2015 - Ein Gedi, Israel
    Duration: 11 Jun 201513 Jun 2015

    Publication series

    NameProceedings of the 8th Annual Symposium on Combinatorial Search, SoCS 2015
    Volume2015-January

    Conference

    Conference8th Annual Symposium on Combinatorial Search, SoCS 2015
    Country/TerritoryIsrael
    CityEin Gedi
    Period11/06/1513/06/15

    ASJC Scopus subject areas

    • Computer Networks and Communications

    Fingerprint

    Dive into the research topics of 'Confidence Backup Updates for Aggregating MDP State Values in Monte-Carlo Tree Search'. Together they form a unique fingerprint.

    Cite this