Model-based online learning of POMDPs

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    31 Scopus citations

    Abstract

    Learning to act in an unknown partially observable domain is a difficult variant of the reinforcement learning paradigm. Research in the area has focused on model-free methods - methods that learn a policy without learning a model of the world. When sensor noise increases, model-free methods provide less accurate policies. The model-based approach - learning a POMDP model of the world, and computing an optimal policy for the learned model - may generate superior results in the presence of sensor noise, but learning and solving a model of the environment is a difficult problem. We have previously shown how such a model can be obtained from the learned policy of model-free methods, but this approach implies a distinction between a learning phase and an acting phase that is undesirable. In this paper we present a novel method for learning a POMDP model online, based on McCallums' Utile Suffix Memory (USM), in conjunction with an approximate policy obtained using an incremental POMDP solver. We show that the incrementally improving policy provides superior results to the original USM algorithm, especially in the presence of increasing sensor and action noise.

    Original languageEnglish
    Title of host publicationMachine Learning - ECML 2005
    Subtitle of host publication16th European Conference on Machine Learning, Proceedings
    PublisherSpringer Verlag
    Pages353-364
    Number of pages12
    ISBN (Print)3540292438, 9783540292432
    DOIs
    StatePublished - 1 Jan 2005
    Event16th European Conference on Machine Learning, ECML 2005 - Porto, Portugal
    Duration: 3 Oct 20057 Oct 2005

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume3720 LNAI
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference16th European Conference on Machine Learning, ECML 2005
    Country/TerritoryPortugal
    CityPorto
    Period3/10/057/10/05

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • General Computer Science

    Fingerprint

    Dive into the research topics of 'Model-based online learning of POMDPs'. Together they form a unique fingerprint.

    Cite this