TY - GEN
T1 - CAMLS
T2 - 15th International Conference on Database Systems for Advanced Applications, DASFAA 2010
AU - Gonen, Yaron
AU - Gal-Oz, Nurit
AU - Yahalom, Ran
AU - Gudes, Ehud
N1 - Funding Information:
Supported by the IMG4 consortium under the MAGNET program of the Israel ministry of trade and industry; and the Lynn and William Frankel center for computer science.
PY - 2010/12/28
Y1 - 2010/12/28
N2 - Mining sequential patterns is a key objective in the field of data mining due to its wide range of applications. Given a database of sequences, the challenge is to identify patterns which appear frequently in different sequences. Well known algorithms have proved to be efficient, however these algorithms do not perform well when mining databases that have long frequent sequences. We present CAMLS, Constraint-based Apriori Mining of Long Sequences, an efficient algorithm for mining long sequential patterns under constraints. CAMLS is based on the apriori property and consists of two phases, event-wise and sequence-wise, which employ an iterative process of candidate-generation followed by frequency-testing. The separation into these two phases allows us to: (i) introduce a novel candidate pruning strategy that increases the efficiency of the mining process and (ii) easily incorporate considerations of intra-event and inter-event constraints. Experiments on both synthetic and real datasets show that CAMLS outperforms previous algorithms when mining long sequences.
AB - Mining sequential patterns is a key objective in the field of data mining due to its wide range of applications. Given a database of sequences, the challenge is to identify patterns which appear frequently in different sequences. Well known algorithms have proved to be efficient, however these algorithms do not perform well when mining databases that have long frequent sequences. We present CAMLS, Constraint-based Apriori Mining of Long Sequences, an efficient algorithm for mining long sequential patterns under constraints. CAMLS is based on the apriori property and consists of two phases, event-wise and sequence-wise, which employ an iterative process of candidate-generation followed by frequency-testing. The separation into these two phases allows us to: (i) introduce a novel candidate pruning strategy that increases the efficiency of the mining process and (ii) easily incorporate considerations of intra-event and inter-event constraints. Experiments on both synthetic and real datasets show that CAMLS outperforms previous algorithms when mining long sequences.
KW - Data mining
KW - Frequent sequences
KW - Sequential patterns
UR - http://www.scopus.com/inward/record.url?scp=78650482064&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-12026-8_7
DO - 10.1007/978-3-642-12026-8_7
M3 - Conference contribution
AN - SCOPUS:78650482064
SN - 3642120253
SN - 9783642120251
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 63
EP - 77
BT - Database Systems for Advanced Applications - 15th International Conference, DASFAA 2010, Proceedings
Y2 - 1 April 2010 through 4 April 2010
ER -