MCTS based on simple regret

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    1 Scopus citations

    Abstract

    UCT, a state-of-the art algorithm for Monte Carlo tree search (MCTS) in games and Markov decision processes, is based on UCB, a sampling policy for the Multi-armed Bandit problem (MAB) that minimizes the cumulative regret. However, search differs from MAB in that in MCTS it is usually only the final "arm pull" (the actual move selection) that collects a reward, rather than all "arm pulls". Therefore, it makes more sense to minimize the simple regret, as opposed to the cumulative regret. We begin by introducing policies for multiarmed bandits with lower finite-time and asymptotic simple regret than UCB, using it to develop a two-stage scheme (SR+CR) for MCTS which outperforms UCT empirically. Optimizing the sampling process is itself a metareasoning problem, a solution of which can use value of information (VOI) techniques. Although the theory of VOI for search exists, applying it to MCTS is non-trivial, as typical myopic assumptions fail. Lacking a complete working VOI theory for MCTS, we nevertheless propose a sampling scheme that is "aware" of VOI, achieving an algorithm that in empirical evaluation outperforms both UCT and the other proposed algorithms.

    Original languageEnglish
    Title of host publicationProceedings of the 5th Annual Symposium on Combinatorial Search, SoCS 2012
    Pages193-199
    Number of pages7
    StatePublished - 1 Dec 2012
    Event5th International Symposium on Combinatorial Search, SoCS 2012 - Niagara Falls, ON, Canada
    Duration: 19 Jul 201221 Jul 2012

    Publication series

    NameProceedings of the 5th Annual Symposium on Combinatorial Search, SoCS 2012

    Conference

    Conference5th International Symposium on Combinatorial Search, SoCS 2012
    Country/TerritoryCanada
    CityNiagara Falls, ON
    Period19/07/1221/07/12

    ASJC Scopus subject areas

    • Computer Networks and Communications

    Fingerprint

    Dive into the research topics of 'MCTS based on simple regret'. Together they form a unique fingerprint.

    Cite this