A near-optimal poly-time algorithm for learning in a class of stochastic games

    Research output: Contribution to journalConference articlepeer-review

    3 Scopus citations

    Abstract

    We present a new algorithm for polynomial time learning of near optimal behavior in stochastic games. This algorithm incorporates and integrates important recent results of Kearns and Singh [1998] in reinforcement learning and of Monderer and Tennenholtz [1997] in repeated games. In stochastic games we face an exploration vs. exploitation dilemma more complex than in Markov decision processes. Namely, given information about particular parts of a game matrix, how much effort should the agent invest in learning its unknown parts. We explain and address these issues within the class of single controller stochastic games. This solution can be extended to stochastic games in general.

    Original languageEnglish
    Pages (from-to)734-739
    Number of pages6
    JournalIJCAI International Joint Conference on Artificial Intelligence
    Volume2
    StatePublished - 1 Dec 1999
    Event16th International Joint Conference on Artificial Intelligence, IJCAI 1999 - Stockholm, Sweden
    Duration: 31 Jul 19996 Aug 1999

    ASJC Scopus subject areas

    • Artificial Intelligence

    Fingerprint

    Dive into the research topics of 'A near-optimal poly-time algorithm for learning in a class of stochastic games'. Together they form a unique fingerprint.

    Cite this