TY - JOUR
T1 - Learning Privately with Labeled and Unlabeled Examples
AU - Beimel, Amos
AU - Nissim, Kobbi
AU - Stemmer, Uri
N1 - Funding Information:
We thank Aryeh Kontorovich, Adam Smith, and Salil Vadhan for helpful discussions of ideas in this work. We thank the anonymous reviewers for their helpful comments and suggestions. Work of A. B. was supported in part by the Israeli Ministry of Science and Technology, by the Israel Science Foundation (Grants 544/13 and 152/17), by the Frankel Center for Computer Science, by ERC Grant 742754 (project NTSC), by the Cyber Security Research Center at Ben-Gurion University of the Negev, and by NSF Grant No. 1565387, TWC: Large: Collaborative: Computing Over Distributed Sensitive Data. Work of K. N. was done in part while the author was visiting in the Center for Research on Computation and Society, Harvard University, and was initially supported by the Israel Science Foundation (Grant 276/12) and by NSF Grant CNS-1237235 and later by NSF Grant No. 1565387 TWC: Large: Collaborative: Computing Over Distributed Sensitive Data. Work of U. S. was supported in part by the Israeli Ministry of Science and Technology, by the Check Point Institute for Information Security, by the IBM PhD Fellowship Awards Program, by the Frankel Center for Computer Science, by the Israel Science Foundation (Grant 1871/19), and by the Cyber Security Research Center at Ben-Gurion University of the Negev.
Publisher Copyright:
© 2020, Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2021/1/1
Y1 - 2021/1/1
N2 - A private learner is an algorithm that given a sample of labeled individual examples outputs a generalizing hypothesis while preserving the privacy of each individual. In 2008, Kasiviswanathan et al. (FOCS 2008) gave a generic construction of private learners, in which the sample complexity is (generally) higher than what is needed for non-private learners. This gap in the sample complexity was then further studied in several followup papers, showing that (at least in some cases) this gap is unavoidable. Moreover, those papers considered ways to overcome the gap, by relaxing either the privacy or the learning guarantees of the learner. We suggest an alternative approach, inspired by the (non-private) models of semi-supervised learning and active-learning, where the focus is on the sample complexity of labeled examples whereas unlabeled examples are of a significantly lower cost. We consider private semi-supervised learners that operate on a random sample, where only a (hopefully small) portion of this sample is labeled. The learners have no control over which of the sample elements are labeled. Our main result is that the labeled sample complexity of private learners is characterized by the VC dimension. We present two generic constructions of private semi-supervised learners. The first construction is of learners where the labeled sample complexity is proportional to the VC dimension of the concept class, however, the unlabeled sample complexity of the algorithm is as big as the representation length of domain elements. Our second construction presents a new technique for decreasing the labeled sample complexity of a given private learner, while roughly maintaining its unlabeled sample complexity. In addition, we show that in some settings the labeled sample complexity does not depend on the privacy parameters of the learner.
AB - A private learner is an algorithm that given a sample of labeled individual examples outputs a generalizing hypothesis while preserving the privacy of each individual. In 2008, Kasiviswanathan et al. (FOCS 2008) gave a generic construction of private learners, in which the sample complexity is (generally) higher than what is needed for non-private learners. This gap in the sample complexity was then further studied in several followup papers, showing that (at least in some cases) this gap is unavoidable. Moreover, those papers considered ways to overcome the gap, by relaxing either the privacy or the learning guarantees of the learner. We suggest an alternative approach, inspired by the (non-private) models of semi-supervised learning and active-learning, where the focus is on the sample complexity of labeled examples whereas unlabeled examples are of a significantly lower cost. We consider private semi-supervised learners that operate on a random sample, where only a (hopefully small) portion of this sample is labeled. The learners have no control over which of the sample elements are labeled. Our main result is that the labeled sample complexity of private learners is characterized by the VC dimension. We present two generic constructions of private semi-supervised learners. The first construction is of learners where the labeled sample complexity is proportional to the VC dimension of the concept class, however, the unlabeled sample complexity of the algorithm is as big as the representation length of domain elements. Our second construction presents a new technique for decreasing the labeled sample complexity of a given private learner, while roughly maintaining its unlabeled sample complexity. In addition, we show that in some settings the labeled sample complexity does not depend on the privacy parameters of the learner.
KW - Active learning
KW - Differential privacy
KW - PAC learning
KW - Semi-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85088866417&partnerID=8YFLogxK
U2 - 10.1007/s00453-020-00753-z
DO - 10.1007/s00453-020-00753-z
M3 - Article
AN - SCOPUS:85088866417
SN - 0178-4617
VL - 83
SP - 177
EP - 215
JO - Algorithmica
JF - Algorithmica
IS - 1
ER -