Phase Transitions in the Detection of Correlated Databases

Dor Elimelech, Wasim Huleihel

Research output: Contribution to journalConference articlepeer-review

Abstract

We study the problem of detecting the correlation between two Gaussian databases X ∈ ℝn×d and Yn×d, each composed of n users with d features. This problem is relevant in the analysis of social media, computational biology, etc. We formulate this as a hypothesis testing problem: under the null hypothesis, these two databases are statistically independent. Under the alternative, however, there exists an unknown permutation σ over the set of n users (or, row permutation), such that X is ρ-correlated with Yσ, a permuted version of Y. We determine sharp thresholds at which optimal testing exhibits a phase transition, depending on the asymptotic regime of n and d. Specifically, we prove that if ρ2d → 0, as d → ∞, then weak detection (performing slightly better than random guessing) is statistically impossible, irrespectively of the value of n. This compliments the performance of a simple test that thresholds the sum all entries of XTY. Furthermore, when d is fixed, we prove that strong detection (vanishing error probability) is impossible for any ρ < ρ*, where ρ* is an explicit function of d, while weak detection is again impossible as long as ρ2d = o(1), as n → ∞. These results close significant gaps in current recent related studies.

Original languageEnglish
Pages (from-to)9246-9266
Number of pages21
JournalProceedings of Machine Learning Research
Volume202
StatePublished - 1 Jan 2023
Event40th International Conference on Machine Learning, ICML 2023 - Honolulu, United States
Duration: 23 Jul 202329 Jul 2023

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Phase Transitions in the Detection of Correlated Databases'. Together they form a unique fingerprint.

Cite this