Differentially Private Source-Target Clustering

Shachar Schnapp, Sivan Sabato

Research output: Contribution to journalArticlepeer-review

Abstract

We consider a new private variant of the Source-Target Clustering (STC) setting, which was introduced by de Mathelin et al. (2022). In STC, there is a target dataset that needs to be clustered by selecting centers, in addition to centers that are already provided in a separate source dataset. The goal is to select centers from the target, such that the target clustering cost given the additional source centers is minimized. We consider private STC, in which the source dataset is private and should only be used under the constraint of differential privacy. This is motivated by scenarios in which the existing centers are private, for instance because they represent individuals in a social network. We derive lower bounds for the private STC objective, illustrating the theoretical limitations on worst-case guarantees for this setting. We then present a differentially private algorithm with asymptotically advantageous results under a data-dependent analysis, in which the guarantee depends on properties of the dataset, as well as more practical variants. We demonstrate in experiments the reduction in clustering cost that is obtained by our practical algorithms compared to baseline approaches. Code is publicly available on https://github.com/ShacharSchnapp/STC.

Original languageEnglish
JournalTransactions on Machine Learning Research
Volume2025
StatePublished - 1 Jan 2025

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Differentially Private Source-Target Clustering'. Together they form a unique fingerprint.

Cite this