TY - GEN
T1 - Enhanced situation space mining for data streams
AU - Mirsky, Yisroel
AU - Halpern, Tal
AU - Upadhyay, Rishabh
AU - Toledo, Sivan
AU - Elovici, Yuval
N1 - Publisher Copyright:
© 2017 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2017/4/3
Y1 - 2017/4/3
N2 - Data streams can capture the situation which an actor is experiencing. Knowledge of the present situation is highly beneficial for a wide range of applications. An algorithm called pcStream can be used to extract situations from a numerical data stream in an unsupervised manner. Although pcStream outperforms other stream clustering algorithms at this task, pcStream has two major flaws. The first is its complexity due to continuously performing principal component analysis (PCA). The second is its difficulty in detecting emerging situations whose distributions overlap in the same feature space. In this paper we introduce pcStream2, a variant of pcStream which employs windowing and persistence in order to distinguish between emerging overlapping concepts. We also propose the use of incremental PCA (IPCA) to reduce the overall complexity and memory requirements of the algorithm. Although any IPCA algorithm can be used, we use a novel IPCA algorithm called Just-In-Time PCA which is better suited for processing streams. JIT-PCA makes intelligent 'short cuts' in order to reduce computations. We provide experimental results on real-world datasets that demonstrates how the proposed improvements make pcStream2 a more accurate and practical tool for situation space mining.
AB - Data streams can capture the situation which an actor is experiencing. Knowledge of the present situation is highly beneficial for a wide range of applications. An algorithm called pcStream can be used to extract situations from a numerical data stream in an unsupervised manner. Although pcStream outperforms other stream clustering algorithms at this task, pcStream has two major flaws. The first is its complexity due to continuously performing principal component analysis (PCA). The second is its difficulty in detecting emerging situations whose distributions overlap in the same feature space. In this paper we introduce pcStream2, a variant of pcStream which employs windowing and persistence in order to distinguish between emerging overlapping concepts. We also propose the use of incremental PCA (IPCA) to reduce the overall complexity and memory requirements of the algorithm. Although any IPCA algorithm can be used, we use a novel IPCA algorithm called Just-In-Time PCA which is better suited for processing streams. JIT-PCA makes intelligent 'short cuts' in order to reduce computations. We provide experimental results on real-world datasets that demonstrates how the proposed improvements make pcStream2 a more accurate and practical tool for situation space mining.
KW - Context space theory
KW - Data mining
KW - Data stream
UR - http://www.scopus.com/inward/record.url?scp=85018758835&partnerID=8YFLogxK
U2 - 10.1145/3019612.3019671
DO - 10.1145/3019612.3019671
M3 - Conference contribution
AN - SCOPUS:85018758835
T3 - Proceedings of the ACM Symposium on Applied Computing
SP - 842
EP - 849
BT - 32nd Annual ACM Symposium on Applied Computing, SAC 2017
PB - Association for Computing Machinery
T2 - 32nd Annual ACM Symposium on Applied Computing, SAC 2017
Y2 - 4 April 2017 through 6 April 2017
ER -