TY - GEN
T1 - Confidence Backup Updates for Aggregating MDP State Values in Monte-Carlo Tree Search
AU - Bnaya, Zahy
AU - Palombo, Alon
AU - Puzis, Rami
AU - Felner, Ariel
N1 - Publisher Copyright:
Copyright © 2015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2015/1/1
Y1 - 2015/1/1
N2 - Monte-Carlo Tree Search (MCTS) algorithms estimate the value of MDP states based on rewards received by performing multiple random simulations. MCTS algorithms can use different strategies to aggregate these rewards and provide an estimation for the states’ values. The most common aggregation method is to store the mean reward of all simulations. Another common approach stores the best observed reward from each state. Both of these methods have complementary benefits and drawbacks. In this paper, we show that both of these methods are biased estimators for the real expected value of MDP states. We propose an hybrid approach that uses the best reward for states with low noise, and otherwise uses the mean. Experimental results on the Sailing MDP domain show that our method has a considerable advantage when the rewards are drawn from a noisy distribution.
AB - Monte-Carlo Tree Search (MCTS) algorithms estimate the value of MDP states based on rewards received by performing multiple random simulations. MCTS algorithms can use different strategies to aggregate these rewards and provide an estimation for the states’ values. The most common aggregation method is to store the mean reward of all simulations. Another common approach stores the best observed reward from each state. Both of these methods have complementary benefits and drawbacks. In this paper, we show that both of these methods are biased estimators for the real expected value of MDP states. We propose an hybrid approach that uses the best reward for states with low noise, and otherwise uses the mean. Experimental results on the Sailing MDP domain show that our method has a considerable advantage when the rewards are drawn from a noisy distribution.
UR - http://www.scopus.com/inward/record.url?scp=85048669921&partnerID=8YFLogxK
U2 - 10.1609/socs.v6i1.18378
DO - 10.1609/socs.v6i1.18378
M3 - Conference contribution
AN - SCOPUS:85048669921
T3 - Proceedings of the 8th Annual Symposium on Combinatorial Search, SoCS 2015
SP - 156
EP - 160
BT - Proceedings of the 8th Annual Symposium on Combinatorial Search, SoCS 2015
A2 - Lelis, Levi
A2 - Stern, Roni
PB - Association for the Advancement of Artificial Intelligence
T2 - 8th Annual Symposium on Combinatorial Search, SoCS 2015
Y2 - 11 June 2015 through 13 June 2015
ER -