Skip to main navigation Skip to search Skip to main content

LTL F /LDL F non-markovian rewards

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    76 Scopus citations

    Abstract

    In Markov Decision Processes (MDPs), the reward obtained in a state is Markovian, i.e., depends on the last state and action. This dependency makes it difficult to reward more interesting long-term behaviors, such as always closing a door after it has been opened, or providing coffee only following a request. Extending MDPs to handle non-Markovian reward functions was the subject of two previous lines of work. Both use LTL variants to specify the reward function and then compile the new model back into a Markovian model. Building on recent progress in temporal logics over finite traces, we adopt LDL f for specifying non-Markovian rewards and provide an elegant automata construction for building a Markovian model, which extends that of previous work and offers strong minimality and compositionality guarantees.

    Original languageEnglish
    Title of host publication32nd AAAI Conference on Artificial Intelligence, AAAI 2018
    PublisherAAAI press
    Pages1771-1778
    Number of pages8
    ISBN (Electronic)9781577358008
    StatePublished - 1 Jan 2018
    Event32nd AAAI Conference on Artificial Intelligence, AAAI 2018 - New Orleans, United States
    Duration: 2 Feb 20187 Feb 2018

    Publication series

    Name32nd AAAI Conference on Artificial Intelligence, AAAI 2018

    Conference

    Conference32nd AAAI Conference on Artificial Intelligence, AAAI 2018
    Country/TerritoryUnited States
    CityNew Orleans
    Period2/02/187/02/18

    ASJC Scopus subject areas

    • Artificial Intelligence

    Fingerprint

    Dive into the research topics of 'LTL F /LDL F non-markovian rewards'. Together they form a unique fingerprint.

    Cite this