TY - JOUR
T1 - The limits of post-selection generalization
AU - Nissim, Kobbi
AU - Smith, Adam
AU - Steinke, Thomas
AU - Stemmer, Uri
AU - Ullman, Jonathan
N1 - Funding Information:
∗Supported by NSF award CNS-1565387. †Supported by NSF awards IIS-1447700 and AF-1763665, a Google Faculty Award and a Sloan Foundation Research Award. ‡Work done while U.S. was a postdoctoral researcher at the Weizmann Institute of Science, supported by a Koshland fellowship, and by the Israel Science Foundation (grants 950/16 and 5219/17). §Supported by NSF awards CCF-1718088, CCF-1750640, and CNS-1816028, and a Google Faculty Award.
Publisher Copyright:
© 2018 Curran Associates Inc.All rights reserved.
PY - 2018/1/1
Y1 - 2018/1/1
N2 - While statistics and machine learning offers numerous methods for ensuring generalization, these methods often fail in the presence of post selection-the common practice in which the choice of analysis depends on previous interactions with the same dataset. A recent line of work has introduced powerful, general purpose algorithms that ensure a property called post hoc generalization (Cummings et al., COLT'16), which says that no person when given the output of the algorithm should be able to find any statistic for which the data differs significantly from the population it came from. In this work we show several limitations on the power of algorithms satisfying post hoc generalization. First, we show a tight lower bound on the error of any algorithm that satisfies post hoc generalization and answers adaptively chosen statistical queries, showing a strong barrier to progress in post selection data analysis. Second, we show that post hoc generalization is not closed under composition, despite many examples of such algorithms exhibiting strong composition properties.
AB - While statistics and machine learning offers numerous methods for ensuring generalization, these methods often fail in the presence of post selection-the common practice in which the choice of analysis depends on previous interactions with the same dataset. A recent line of work has introduced powerful, general purpose algorithms that ensure a property called post hoc generalization (Cummings et al., COLT'16), which says that no person when given the output of the algorithm should be able to find any statistic for which the data differs significantly from the population it came from. In this work we show several limitations on the power of algorithms satisfying post hoc generalization. First, we show a tight lower bound on the error of any algorithm that satisfies post hoc generalization and answers adaptively chosen statistical queries, showing a strong barrier to progress in post selection data analysis. Second, we show that post hoc generalization is not closed under composition, despite many examples of such algorithms exhibiting strong composition properties.
UR - http://www.scopus.com/inward/record.url?scp=85064806424&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85064806424
SN - 1049-5258
VL - 2018-December
SP - 6400
EP - 6409
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 32nd Conference on Neural Information Processing Systems, NeurIPS 2018
Y2 - 2 December 2018 through 8 December 2018
ER -