Efficient, safe, and probably approximately complete learning of action models

Roni Stern, Brendan Juba

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

21 Scopus citations

Abstract

In this paper we explore the theoretical boundaries of planning in a setting where no model of the agent's actions is given. Instead of an action model, a set of successfully executed plans are given and the task is to generate a plan that is safe, i.e., guaranteed to achieve the goal without failing. To this end, we show how to learn a conservative model of the world in which actions are guaranteed to be applicable. This conservative model is then given to an off-the-shelf classical planner, resulting in a plan that is guaranteed to achieve the goal. However, this reduction from a model-free planning to a model-based planning is not complete: in some cases a plan will not be found even when such exists. We analyze the relation between the number of observed plans and the likelihood that our conservative approach will indeed fail to solve a solvable problem. Our analysis show that the number of trajectories needed scales gracefully.

Original languageEnglish
Title of host publication26th International Joint Conference on Artificial Intelligence, IJCAI 2017
EditorsCarles Sierra
PublisherInternational Joint Conferences on Artificial Intelligence
Pages4405-4411
Number of pages7
ISBN (Electronic)9780999241103
DOIs
StatePublished - 1 Jan 2017
Event26th International Joint Conference on Artificial Intelligence, IJCAI 2017 - Melbourne, Australia
Duration: 19 Aug 201725 Aug 2017

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
Volume0
ISSN (Print)1045-0823

Conference

Conference26th International Joint Conference on Artificial Intelligence, IJCAI 2017
Country/TerritoryAustralia
CityMelbourne
Period19/08/1725/08/17

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Efficient, safe, and probably approximately complete learning of action models'. Together they form a unique fingerprint.

Cite this