An Interface for Black Box Learning in Probabilistic Programs

Jan-Willem van de Meent, Brooks Paige, David Tolpin, Frank Wood

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this abstract we are interested in algorithms that combine inference with learning. As a motivating example we consider a program (see Figure 1), written in the language Anglican [7], which simulates the Canadian traveler problem (CTP) domain. In the CTP, an agent must travel along a graph, which represents a network of roads, to get from the start node (green) to the target node (red). Due to bad weather some roads are blocked, but the agent does not know which in advance. The agent performs depth-first search along the graph, which will require a varying number of steps, depending on which edges are closed, and incurs a cost for the traveled distance. The program in Figure 1 defines two types of policies for the CTP. For the policy where edges are chosen at random, we may perform online planning by simulating future actions and outcomes, also known as rollouts, and choosing the action that minimizes the expected cost. Alternatively we may learn a policy that, after an initial training period, can be applied without calculating rollouts. To do so we consider a deterministic policy for which we learn a set of parameters (the edge preferences).
Original languageEnglish GB
Title of host publicationPOPL Workshop on Probabilistic Programming Semantics
StatePublished - 2016

Fingerprint

Dive into the research topics of 'An Interface for Black Box Learning in Probabilistic Programs'. Together they form a unique fingerprint.

Cite this