TY - GEN

T1 - Learning Probably Approximately Complete and Safe Action Models for StochasticWorlds

AU - Juba, Brendan

AU - Stern, Roni

N1 - Funding Information:
We thank our reviewers for their constructive comments. This research is partially funded by NSF awards IIS-1908287 and CCF-1718380, and BSF grant #2018684 to Roni Stern, and by the Defense Advanced Research Projects Agency (DARPA) as part of the SAIL-ON program.
Publisher Copyright:
Copyright © 2022, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

PY - 2022/6/30

Y1 - 2022/6/30

N2 - We consider the problem of learning action models for planning in unknown stochastic environments that can be defined using the Probabilistic Planning Domain Description Language (PPDDL). As input, we are given a set of previously executed trajectories, and the main challenge is to learn an action model that has a similar goal achievement probability to the policies used to create these trajectories. To this end, we introduce a variant of PPDDL in which there is uncertainty about the transition probabilities, specified by an interval for each factor that contains the respective true transition probabilities. Then, we present SAM+, an algorithm that learns such an imprecise-PPDDL environment model. SAM+ has a polynomial time and sample complexity, and guarantees that with high probability, the true environment is indeed captured by the defined intervals. We prove that the action model SAM+ outputs has a goal achievement probability that is almost as good or better than that of the policies used to produced the training trajectories. Then, we show how to produce a PPDDL model based on this imprecise-PPDDL model that has similar properties.

AB - We consider the problem of learning action models for planning in unknown stochastic environments that can be defined using the Probabilistic Planning Domain Description Language (PPDDL). As input, we are given a set of previously executed trajectories, and the main challenge is to learn an action model that has a similar goal achievement probability to the policies used to create these trajectories. To this end, we introduce a variant of PPDDL in which there is uncertainty about the transition probabilities, specified by an interval for each factor that contains the respective true transition probabilities. Then, we present SAM+, an algorithm that learns such an imprecise-PPDDL environment model. SAM+ has a polynomial time and sample complexity, and guarantees that with high probability, the true environment is indeed captured by the defined intervals. We prove that the action model SAM+ outputs has a goal achievement probability that is almost as good or better than that of the policies used to produced the training trajectories. Then, we show how to produce a PPDDL model based on this imprecise-PPDDL model that has similar properties.

UR - http://www.scopus.com/inward/record.url?scp=85127886941&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85127886941

T3 - Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022

SP - 9795

EP - 9804

BT - AAAI-22 Technical Tracks 9

PB - Association for the Advancement of Artificial Intelligence

T2 - 36th AAAI Conference on Artificial Intelligence, AAAI 2022

Y2 - 22 February 2022 through 1 March 2022

ER -