TY - JOUR
T1 - Efficient and Privacy preserving Approximation of Distributed Statistical Queries
AU - Derbeko, Philip
AU - Dolev, Shlomi
AU - Gudes, Ehud
AU - Ullman, Jeffrey D.
N1 - Publisher Copyright:
IEEE
PY - 2021/1/1
Y1 - 2021/1/1
N2 - In recent years, an increasing amount of data is collected in different and often, not cooperative, databases. The problem of privacy-preserving, distributed calculations over separate databases and, a relative to it, the issue of private data release was intensively investigated. However, despite a considerable progress, computational complexity and consequently, the performance of the computations, due to an increasing size of data, remains a limiting factor in real-world deployments. Especially in the case of privacy-preserving computations. In this paper, we suggest sampling as a method of improving computational performance. Sampling was a topic of extensive research in the past that recently received a boost of interest. We provide a sampling method targeted at separate, non-collaborating, vertically partitioned datasets. The method is exemplified and tested on an approximation of intersection set both with and without a privacy-preserving mechanism. An analysis of the bound on the error as a function of the sample size is discussed and a heuristic algorithm is suggested to further improve the performance. The algorithms were implemented and experimental results confirm the validity of the approach.
AB - In recent years, an increasing amount of data is collected in different and often, not cooperative, databases. The problem of privacy-preserving, distributed calculations over separate databases and, a relative to it, the issue of private data release was intensively investigated. However, despite a considerable progress, computational complexity and consequently, the performance of the computations, due to an increasing size of data, remains a limiting factor in real-world deployments. Especially in the case of privacy-preserving computations. In this paper, we suggest sampling as a method of improving computational performance. Sampling was a topic of extensive research in the past that recently received a boost of interest. We provide a sampling method targeted at separate, non-collaborating, vertically partitioned datasets. The method is exemplified and tested on an approximation of intersection set both with and without a privacy-preserving mechanism. An analysis of the bound on the error as a function of the sample size is discussed and a heuristic algorithm is suggested to further improve the performance. The algorithms were implemented and experimental results confirm the validity of the approach.
KW - Approximate Computations
KW - Approximation algorithms
KW - Differential Privacy
KW - Differential privacy
KW - Distributed Computations
KW - Distributed databases
KW - Estimation
KW - Heuristic algorithms
KW - Law enforcement
KW - Protocols
UR - http://www.scopus.com/inward/record.url?scp=85099731258&partnerID=8YFLogxK
U2 - 10.1109/TBDATA.2021.3052516
DO - 10.1109/TBDATA.2021.3052516
M3 - Article
AN - SCOPUS:85099731258
JO - IEEE Transactions on Big Data
JF - IEEE Transactions on Big Data
SN - 2332-7790
ER -