CAMLS: A constraint-based Apriori algorithm for mining long sequences

Yaron Gonen, Nurit Gal-Oz, Ran Yahalom, Ehud Gudes

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

Mining sequential patterns is a key objective in the field of data mining due to its wide range of applications. Given a database of sequences, the challenge is to identify patterns which appear frequently in different sequences. Well known algorithms have proved to be efficient, however these algorithms do not perform well when mining databases that have long frequent sequences. We present CAMLS, Constraint-based Apriori Mining of Long Sequences, an efficient algorithm for mining long sequential patterns under constraints. CAMLS is based on the apriori property and consists of two phases, event-wise and sequence-wise, which employ an iterative process of candidate-generation followed by frequency-testing. The separation into these two phases allows us to: (i) introduce a novel candidate pruning strategy that increases the efficiency of the mining process and (ii) easily incorporate considerations of intra-event and inter-event constraints. Experiments on both synthetic and real datasets show that CAMLS outperforms previous algorithms when mining long sequences.

Original languageEnglish
Title of host publicationDatabase Systems for Advanced Applications - 15th International Conference, DASFAA 2010, Proceedings
Pages63-77
Number of pages15
EditionPART 1
DOIs
StatePublished - 28 Dec 2010
Event15th International Conference on Database Systems for Advanced Applications, DASFAA 2010 - Tsukuba, Japan
Duration: 1 Apr 20104 Apr 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 1
Volume5981 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference15th International Conference on Database Systems for Advanced Applications, DASFAA 2010
Country/TerritoryJapan
CityTsukuba
Period1/04/104/04/10

Keywords

  • Data mining
  • Frequent sequences
  • Sequential patterns

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'CAMLS: A constraint-based Apriori algorithm for mining long sequences'. Together they form a unique fingerprint.

Cite this