On the Common Substring Alignment Problem

Gad M. Landau, Michal Ziv-Ukelson

Research output: Contribution to journalArticlepeer-review

36 Scopus citations

Abstract

The Common Substring Alignment Problem is defined as follows: Given a set of one or more strings S1, S2 . . . Sc and a target string T, Y is a common substring of all strings Si, that is, Si = BiYFi. The goal is to compute the similarity of all strings Si with T, without computing the part of Y again and again. Using the classical dynamic programming tables, each appearance of Y in a source string would require the computation of all the values in a dynamic programming table of size O(nℓ) where ℓ is the size of Y. Here we describe an algorithm which is composed of an encoding stage and an alignment stage. During the first stage, a data structure is constructed which encodes the comparison of Y with T. Then, during the alignment stage, for each comparison of a source Si with T, the precompiled data structure is used to speed up the part of Y. We show how to reduce the O(nℓ) alignment work, for each appearance of the common substring Y in a source string, to O(n)-at the cost of O(nℓ) encoding work, which is executed only once.

Original languageEnglish
Pages (from-to)338-359
Number of pages22
JournalJournal of Algorithms
Volume41
Issue number2
DOIs
StatePublished - 1 Jan 2001
Externally publishedYes

Keywords

  • Candidate lists
  • Design and analysis of algorithms
  • Dynamic programming
  • Monge arrays
  • Repeated substrings
  • Sequence comparison
  • Shared substrings

ASJC Scopus subject areas

  • Control and Optimization
  • Computational Mathematics
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'On the Common Substring Alignment Problem'. Together they form a unique fingerprint.

Cite this