TY - UNPB
T1 - Approximating a Target Distribution using Weight Queries
AU - Barak, Nadav
AU - Sabato, Sivan
PY - 2020/6/24
Y1 - 2020/6/24
N2 - We consider a novel challenge: approximating a distribution without the
ability to randomly sample from that distribution. We study how such an
approximation can be obtained using *weight queries*. Given some data
set of examples, a weight query presents one of the examples to an
oracle, which returns the probability, according to the target
distribution, of observing examples similar to the presented example.
This oracle can represent, for instance, counting queries to a database
of the target population, or an interface to a search engine which
returns the number of results that match a given search. We propose an
interactive algorithm that iteratively selects data set examples and
performs corresponding weight queries. The algorithm finds a reweighting
of the data set that approximates the weights according to the target
distribution, using a limited number of weight queries. We derive an
approximation bound on the total variation distance between the
reweighting found by the algorithm and the best achievable reweighting.
Our algorithm takes inspiration from the UCB approach common in
multi-armed bandits problems, and combines it with a new discrepancy
estimator and a greedy iterative procedure. In addition to our
theoretical guarantees, we demonstrate in experiments the advantages of
the proposed algorithm over several baselines. A python implementation
of the proposed algorithm and of all the experiments can be found at
https://github.com/Nadav-Barak/AWP.
AB - We consider a novel challenge: approximating a distribution without the
ability to randomly sample from that distribution. We study how such an
approximation can be obtained using *weight queries*. Given some data
set of examples, a weight query presents one of the examples to an
oracle, which returns the probability, according to the target
distribution, of observing examples similar to the presented example.
This oracle can represent, for instance, counting queries to a database
of the target population, or an interface to a search engine which
returns the number of results that match a given search. We propose an
interactive algorithm that iteratively selects data set examples and
performs corresponding weight queries. The algorithm finds a reweighting
of the data set that approximates the weights according to the target
distribution, using a limited number of weight queries. We derive an
approximation bound on the total variation distance between the
reweighting found by the algorithm and the best achievable reweighting.
Our algorithm takes inspiration from the UCB approach common in
multi-armed bandits problems, and combines it with a new discrepancy
estimator and a greedy iterative procedure. In addition to our
theoretical guarantees, we demonstrate in experiments the advantages of
the proposed algorithm over several baselines. A python implementation
of the proposed algorithm and of all the experiments can be found at
https://github.com/Nadav-Barak/AWP.
KW - Computer Science - Machine Learning
KW - Statistics - Machine Learning
U2 - 10.48550/arXiv.2006.13636
DO - 10.48550/arXiv.2006.13636
M3 - Preprint
BT - Approximating a Target Distribution using Weight Queries
ER -