TY - GEN
T1 - Preserving differential privacy and utility of non-stationary data streams
AU - Khavkin, Michael
AU - Last, Mark
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - Data publishing poses many challenges regarding the efforts to preserve data privacy, on one hand, and maintain its high utility, on the other hand. The Privacy Preserving Data Publishing field (PPDP) has emerged as a possible solution to such trade-off, allowing data miners to analyze the published data, while providing a sufficient degree of privacy. Most existing anonymization platforms deal with static and stationary data, which can be scanned at least once before its publishing. More and more real-world applications generate streams of data which can be non-stationary, i.e., subject to a concept drift. In this paper, we introduce MiDiPSA (Microaggregation-based Differential Private Stream Anonymization) algorithm for non-stationary data streams, which aims at satisfying the constraints of k-anonymity, recursive (c, l)-diversity, and differential privacy while minimizing the information loss and the possible disclosure risk. The algorithm is implemented via four main steps: Incremental clustering of the incoming tuples; incremental aggregation of the tuples in each cluster according to a pre-defined aggregation function; monitoring of the stream in order to detect possible concept drifts using a non-parametric Kolmogorov-Smirnov statistical test; and incremental publishing of anonymized tuples. Whenever a concept drift is detected, the clustering system is updated to reflect the current changes in the stream, without affecting the publishing process. In our empirical evaluation, we analyze the performance of various data stream classifiers on the anonymized data and compare it to their performance on the original data. We conduct experiments with seven benchmark data streams and show that our algorithm preserves privacy while providing higher utility, in comparison with other state-of-the-art anonymization algorithms.
AB - Data publishing poses many challenges regarding the efforts to preserve data privacy, on one hand, and maintain its high utility, on the other hand. The Privacy Preserving Data Publishing field (PPDP) has emerged as a possible solution to such trade-off, allowing data miners to analyze the published data, while providing a sufficient degree of privacy. Most existing anonymization platforms deal with static and stationary data, which can be scanned at least once before its publishing. More and more real-world applications generate streams of data which can be non-stationary, i.e., subject to a concept drift. In this paper, we introduce MiDiPSA (Microaggregation-based Differential Private Stream Anonymization) algorithm for non-stationary data streams, which aims at satisfying the constraints of k-anonymity, recursive (c, l)-diversity, and differential privacy while minimizing the information loss and the possible disclosure risk. The algorithm is implemented via four main steps: Incremental clustering of the incoming tuples; incremental aggregation of the tuples in each cluster according to a pre-defined aggregation function; monitoring of the stream in order to detect possible concept drifts using a non-parametric Kolmogorov-Smirnov statistical test; and incremental publishing of anonymized tuples. Whenever a concept drift is detected, the clustering system is updated to reflect the current changes in the stream, without affecting the publishing process. In our empirical evaluation, we analyze the performance of various data stream classifiers on the anonymized data and compare it to their performance on the original data. We conduct experiments with seven benchmark data streams and show that our algorithm preserves privacy while providing higher utility, in comparison with other state-of-the-art anonymization algorithms.
KW - Concept Drift
KW - Data Stream Mining
KW - Differential Privacy
KW - Microaggregation
KW - Privacy-Preserving Data Publishing
UR - http://www.scopus.com/inward/record.url?scp=85062855610&partnerID=8YFLogxK
U2 - 10.1109/ICDMW.2018.00012
DO - 10.1109/ICDMW.2018.00012
M3 - Conference contribution
AN - SCOPUS:85062855610
T3 - IEEE International Conference on Data Mining Workshops, ICDMW
SP - 29
EP - 34
BT - Proceedings - 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018
A2 - Tong, Hanghang
A2 - Li, Zhenhui
A2 - Zhu, Feida
A2 - Yu, Jeffrey
PB - Institute of Electrical and Electronics Engineers
T2 - 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018
Y2 - 17 November 2018 through 20 November 2018
ER -