LTL F /LDL F non-markovian rewards

Ronen I. Brafman, Giuseppe De Giacomo, Fabio Patrizi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

51 Scopus citations

Abstract

In Markov Decision Processes (MDPs), the reward obtained in a state is Markovian, i.e., depends on the last state and action. This dependency makes it difficult to reward more interesting long-term behaviors, such as always closing a door after it has been opened, or providing coffee only following a request. Extending MDPs to handle non-Markovian reward functions was the subject of two previous lines of work. Both use LTL variants to specify the reward function and then compile the new model back into a Markovian model. Building on recent progress in temporal logics over finite traces, we adopt LDL f for specifying non-Markovian rewards and provide an elegant automata construction for building a Markovian model, which extends that of previous work and offers strong minimality and compositionality guarantees.

Original languageEnglish
Title of host publication32nd AAAI Conference on Artificial Intelligence, AAAI 2018
PublisherAAAI press
Pages1771-1778
Number of pages8
ISBN (Electronic)9781577358008
StatePublished - 1 Jan 2018
Event32nd AAAI Conference on Artificial Intelligence, AAAI 2018 - New Orleans, United States
Duration: 2 Feb 20187 Feb 2018

Publication series

Name32nd AAAI Conference on Artificial Intelligence, AAAI 2018

Conference

Conference32nd AAAI Conference on Artificial Intelligence, AAAI 2018
Country/TerritoryUnited States
CityNew Orleans
Period2/02/187/02/18

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'LTL F /LDL F non-markovian rewards'. Together they form a unique fingerprint.

Cite this