TY - JOUR
T1 - Near-optimal polynomial time algorithm for learning in certain classes of stochastic games
AU - Brafman, Ronen I.
AU - Tennenholtz, Moshe
N1 - Funding Information:
We are grateful to the anonymous reviewers for their important comments. The first author was supported in part by the Paul Ivanier Center for Robotics Research and Production Management. The second author was supported by the US-Israel Binational Science Foundation.
PY - 2000/1/1
Y1 - 2000/1/1
N2 - We present a new algorithm for polynomial time learning of optimal behavior in single-controller stochastic games. This algorithm incorporates and integrates important recent results of Kearns and Singh in reinforcement learning and of Monderer and Tennenholtz in repeated games. In stochastic games, the agent must cope with the existence of an adversary whose actions can be arbitrary. In particular, this adversary can withhold information about the game matrix by refraining from (or rarely) performing certain actions. This forces upon us an exploration versus exploitation dilemma more complex than in Markov decision processes in which, given information about particular parts of a game matrix, the agent must decide how much effort to invest in learning the unknown parts of the matrix. We present a polynomial time algorithm that addresses these issues in the context of the class of single controller stochastic games, providing the agent with near-optimal return.
AB - We present a new algorithm for polynomial time learning of optimal behavior in single-controller stochastic games. This algorithm incorporates and integrates important recent results of Kearns and Singh in reinforcement learning and of Monderer and Tennenholtz in repeated games. In stochastic games, the agent must cope with the existence of an adversary whose actions can be arbitrary. In particular, this adversary can withhold information about the game matrix by refraining from (or rarely) performing certain actions. This forces upon us an exploration versus exploitation dilemma more complex than in Markov decision processes in which, given information about particular parts of a game matrix, the agent must decide how much effort to invest in learning the unknown parts of the matrix. We present a polynomial time algorithm that addresses these issues in the context of the class of single controller stochastic games, providing the agent with near-optimal return.
UR - http://www.scopus.com/inward/record.url?scp=0034247018&partnerID=8YFLogxK
U2 - 10.1016/S0004-3702(00)00039-4
DO - 10.1016/S0004-3702(00)00039-4
M3 - Article
AN - SCOPUS:0034247018
SN - 0004-3702
VL - 121
SP - 31
EP - 47
JO - Artificial Intelligence
JF - Artificial Intelligence
IS - 1
ER -