A formal investigation of the utility-privacy tradeoff in statistical databases has proved essential for the rigorous discussion of privacy of recent years. Initial results in this direction dealt with databases that answer (all) subset-sum queries to within some fixed distortion [Dinur and Nissim, PODC 2003]. Subsequent work has extended these results to the case where a constant portion of the queries are answered arbitrarily [Dwork, McSherry, and Talwar, STOC 2007], and furthermore to the case where up to almost half the queries are answered arbitrarily [Dwork and Yekhanin, CRYPTO 2008]. All these results demonstrate how an efficient attacker may learn the underlying database (exactly or approximately), and hence bear consequences to tasks such as private sanitization of data. We give the first efficient attack for the case where the queries that are answered within the fixed distortion form only a polynomially small fraction of the queries (the rest are answered arbitrarily). Our techniques borrow from program correction and learning in the presence of noise.
- Privacy Statistical databases Databases Learning with noise