TY - GEN
T1 - Perfect Subset Privacy for Data Sharing and Learning
AU - Raviv, Netanel
AU - Goldfeld, Ziv
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022/1/1
Y1 - 2022/1/1
N2 - As the size of modern datasets grows, it becomes increasingly common to delegate computational tasks to service providers. Doing so, however, raises privacy concerns. Privatization schemes which enable learning algorithms to be executed unaltered have been recently popularized under the name instance encoding, aiming to circumvent the large overhead of traditional cryptographic primitives. In this work we take an information/coding-theoretic approach towards instance encoding. Specifically, recent works have shown that general-purpose data sharing can be achieved without leaking information about any individual datapoint (marginally), while maintaining high mutual information with the dataset in its entirety. We first extend this framework to capture the entire privacy-utility tradeoff, accounting for the privatization of any subset of the dataset, and provide a coding scheme for doing so. Second, we introduce a necessary algebraic condition for applying unaltered learning algorithms on encrypted data, termed signal preservation, and present an additional scheme which guarantees it. Both schemes achieve almost maximal mutual information with the entire dataset, under appropriate assumptions. The construction relies on some classic ideas such as Shamir secret sharing, as well as a novel technique called random Hadamard coding.
AB - As the size of modern datasets grows, it becomes increasingly common to delegate computational tasks to service providers. Doing so, however, raises privacy concerns. Privatization schemes which enable learning algorithms to be executed unaltered have been recently popularized under the name instance encoding, aiming to circumvent the large overhead of traditional cryptographic primitives. In this work we take an information/coding-theoretic approach towards instance encoding. Specifically, recent works have shown that general-purpose data sharing can be achieved without leaking information about any individual datapoint (marginally), while maintaining high mutual information with the dataset in its entirety. We first extend this framework to capture the entire privacy-utility tradeoff, accounting for the privatization of any subset of the dataset, and provide a coding scheme for doing so. Second, we introduce a necessary algebraic condition for applying unaltered learning algorithms on encrypted data, termed signal preservation, and present an additional scheme which guarantees it. Both schemes achieve almost maximal mutual information with the entire dataset, under appropriate assumptions. The construction relies on some classic ideas such as Shamir secret sharing, as well as a novel technique called random Hadamard coding.
UR - http://www.scopus.com/inward/record.url?scp=85136256522&partnerID=8YFLogxK
U2 - 10.1109/ISIT50566.2022.9834572
DO - 10.1109/ISIT50566.2022.9834572
M3 - Conference contribution
AN - SCOPUS:85136256522
T3 - IEEE International Symposium on Information Theory - Proceedings
SP - 1850
EP - 1855
BT - 2022 IEEE International Symposium on Information Theory, ISIT 2022
PB - Institute of Electrical and Electronics Engineers
T2 - 2022 IEEE International Symposium on Information Theory, ISIT 2022
Y2 - 26 June 2022 through 1 July 2022
ER -