Sparse LCS Common Substring Alignment

Gad M. Landau, Baruch Schieber, Michal Ziv-Ukelson

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

The "Common Substring Alignment" problem is defined as follows. The input consists of a set of strings S1, S2..., S c, with a common substring appearing at least once in each of them, and a target string T. The goal is to compute similarity of all strings S i with T, without computing the part of the common substring over and over again. In this paper we consider the Common Substring Alignment problem for the LCS (Longest Common Subsequence) similarity metric. Our algorithm gains its efficiency by exploiting the sparsity inherent to the LCS problem. Let Y be the common substring, n be the size of the compared sequences, Ly be the length of the LCS of T and Y, denoted |LCS[T, Y]|, and L be max{|LCS[T, Si]|}. Our algorithm consists of an O(nLy) time encoding stage that is executed once per common substring, and an O(L) time alignment stage that is executed once for each appearance of the common substring in each source string. The additional running time depends only on the length of the parts of the strings that are not in any common substring.

Original languageEnglish
Pages (from-to)259-270
Number of pages12
JournalInformation Processing Letters
Volume88
Issue number6
DOIs
StatePublished - 31 Dec 2003
Externally publishedYes

Keywords

  • Algorithms
  • Common Substring Alignment
  • LCS
  • Sparsity

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Signal Processing
  • Information Systems
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Sparse LCS Common Substring Alignment'. Together they form a unique fingerprint.

Cite this