TY - JOUR
T1 - Adaptive Data Analysis with Correlated Observations
AU - Kontorovich, Aryeh
AU - Sadigurschi, Menachem
AU - Stemmer, Uri
N1 - Funding Information:
M.S. was supported in part by the Lynn and William Frankel Center for Computer Science at Ben-Gurion University. A.K. and M.S. were supported in part by Ben-Gurion Data Science Research Centerat Ben-Gurion University of the Negev. U.S. was Partially supported by Israel Science Foundation (grant 1871/19) and by Len Blavatnik and the Blavatnik Family foundation.
Publisher Copyright:
Copyright © 2022 by the author(s)
PY - 2022/1/1
Y1 - 2022/1/1
N2 - The vast majority of the work on adaptive data analysis focuses on the case where the samples in the dataset are independent. Several approaches and tools have been successfully applied in this context, such as differential privacy, max-information, compression arguments, and more. The situation is far less well-understood without the independence assumption. We embark on a systematic study of the possibilities of adaptive data analysis with correlated observations. First, we show that, in some cases, differential privacy guarantees generalization even when there are dependencies within the sample, which we quantify using a notion we call Gibbs-dependence. We complement this result with a tight negative example. Second, we show that the connection between transcript-compression and adaptive data analysis can be extended to the non-iid setting.
AB - The vast majority of the work on adaptive data analysis focuses on the case where the samples in the dataset are independent. Several approaches and tools have been successfully applied in this context, such as differential privacy, max-information, compression arguments, and more. The situation is far less well-understood without the independence assumption. We embark on a systematic study of the possibilities of adaptive data analysis with correlated observations. First, we show that, in some cases, differential privacy guarantees generalization even when there are dependencies within the sample, which we quantify using a notion we call Gibbs-dependence. We complement this result with a tight negative example. Second, we show that the connection between transcript-compression and adaptive data analysis can be extended to the non-iid setting.
UR - http://www.scopus.com/inward/record.url?scp=85139638759&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85139638759
SN - 2640-3498
VL - 162
SP - 11483
EP - 11498
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 39th International Conference on Machine Learning, ICML 2022
Y2 - 17 July 2022 through 23 July 2022
ER -