Differentially Private Weighted Sampling

Edith Cohen, Ofir Geri, Tamás Sarlós, Uri Stemmer

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Common datasets have the form of elements with keys (e.g., transactions and products) and the goal is to perform analytics on the aggregated form of key and frequency pairs. A weighted sample of keys by (a function of) frequency is a highly versatile summary that provides a sparse set of representative keys and supports approximate evaluations of query statistics. We propose private weighted sampling (PWS): A method that sanitizes a weighted sample as to ensure element-level differential privacy, while retaining its utility to the maximum extent possible. PWS maximizes the reporting probabilities of keys and estimation quality of a broad family of statistics. PWS improves over the state of the art even for the well-studied special case of private histograms, when no sampling is performed. We empirically observe significant performance gains of 20%-300% increase in key reporting for common Zipfian frequency distributions and accurate estimation with x2-8 lower frequencies. PWS is applied as a post-processing of a non-private sample, without requiring the original data. Therefore, it can be a seamless addition to existing implementations, such as those optimizes for distributed or streamed data. We believe that due to practicality and performance, PWS may become a method of choice in applications where privacy is desired.
Original languageEnglish GB
Title of host publicationThe 24th International Conference on Artificial Intelligence and Statistics, AISTATS 2021, April 13-15, 2021, Virtual Event
EditorsArindam Banerjee, Kenji Fukumizu
PublisherPMLR
Pages2404-2412
Number of pages9
Volume130
StatePublished - 2021

Publication series

NameProceedings of Machine Learning Research
PublisherPMLR

Fingerprint

Dive into the research topics of 'Differentially Private Weighted Sampling'. Together they form a unique fingerprint.

Cite this