Preserving differential privacy and utility of non-stationary data streams

Michael Khavkin, Mark Last

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

Data publishing poses many challenges regarding the efforts to preserve data privacy, on one hand, and maintain its high utility, on the other hand. The Privacy Preserving Data Publishing field (PPDP) has emerged as a possible solution to such trade-off, allowing data miners to analyze the published data, while providing a sufficient degree of privacy. Most existing anonymization platforms deal with static and stationary data, which can be scanned at least once before its publishing. More and more real-world applications generate streams of data which can be non-stationary, i.e., subject to a concept drift. In this paper, we introduce MiDiPSA (Microaggregation-based Differential Private Stream Anonymization) algorithm for non-stationary data streams, which aims at satisfying the constraints of k-anonymity, recursive (c, l)-diversity, and differential privacy while minimizing the information loss and the possible disclosure risk. The algorithm is implemented via four main steps: Incremental clustering of the incoming tuples; incremental aggregation of the tuples in each cluster according to a pre-defined aggregation function; monitoring of the stream in order to detect possible concept drifts using a non-parametric Kolmogorov-Smirnov statistical test; and incremental publishing of anonymized tuples. Whenever a concept drift is detected, the clustering system is updated to reflect the current changes in the stream, without affecting the publishing process. In our empirical evaluation, we analyze the performance of various data stream classifiers on the anonymized data and compare it to their performance on the original data. We conduct experiments with seven benchmark data streams and show that our algorithm preserves privacy while providing higher utility, in comparison with other state-of-the-art anonymization algorithms.

Original languageEnglish
Title of host publicationProceedings - 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018
EditorsHanghang Tong, Zhenhui Li, Feida Zhu, Jeffrey Yu
PublisherInstitute of Electrical and Electronics Engineers
Pages29-34
Number of pages6
ISBN (Electronic)9781538692882
DOIs
StatePublished - 2 Jul 2018
Event18th IEEE International Conference on Data Mining Workshops, ICDMW 2018 - Singapore, Singapore
Duration: 17 Nov 201820 Nov 2018

Publication series

NameIEEE International Conference on Data Mining Workshops, ICDMW
Volume2018-November
ISSN (Print)2375-9232
ISSN (Electronic)2375-9259

Conference

Conference18th IEEE International Conference on Data Mining Workshops, ICDMW 2018
Country/TerritorySingapore
CitySingapore
Period17/11/1820/11/18

Keywords

  • Concept Drift
  • Data Stream Mining
  • Differential Privacy
  • Microaggregation
  • Privacy-Preserving Data Publishing

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'Preserving differential privacy and utility of non-stationary data streams'. Together they form a unique fingerprint.

Cite this