TY - JOUR

T1 - On the Common Substring Alignment Problem

AU - Landau, Gad M.

AU - Ziv-Ukelson, Michal

N1 - Funding Information:
3Partially supported by the Israel Science Foundation grants 173/98 and 282/01, and by the FIRST Foundation of the Isreal Academy of Science and Humanities.
Funding Information:
2Partially supported by NSF grants CCR-9610238 and CCR-0104307, by NATO Science Programme grant PST.CLG.977017, and by the Israel Science Foundation grants 173/98 and 282/01, by the FIRST Foundation of the Israel Academy of Science and Humanities, and by IBM Faculty Partnership Award.

PY - 2001/1/1

Y1 - 2001/1/1

N2 - The Common Substring Alignment Problem is defined as follows: Given a set of one or more strings S1, S2 . . . Sc and a target string T, Y is a common substring of all strings Si, that is, Si = BiYFi. The goal is to compute the similarity of all strings Si with T, without computing the part of Y again and again. Using the classical dynamic programming tables, each appearance of Y in a source string would require the computation of all the values in a dynamic programming table of size O(nℓ) where ℓ is the size of Y. Here we describe an algorithm which is composed of an encoding stage and an alignment stage. During the first stage, a data structure is constructed which encodes the comparison of Y with T. Then, during the alignment stage, for each comparison of a source Si with T, the precompiled data structure is used to speed up the part of Y. We show how to reduce the O(nℓ) alignment work, for each appearance of the common substring Y in a source string, to O(n)-at the cost of O(nℓ) encoding work, which is executed only once.

AB - The Common Substring Alignment Problem is defined as follows: Given a set of one or more strings S1, S2 . . . Sc and a target string T, Y is a common substring of all strings Si, that is, Si = BiYFi. The goal is to compute the similarity of all strings Si with T, without computing the part of Y again and again. Using the classical dynamic programming tables, each appearance of Y in a source string would require the computation of all the values in a dynamic programming table of size O(nℓ) where ℓ is the size of Y. Here we describe an algorithm which is composed of an encoding stage and an alignment stage. During the first stage, a data structure is constructed which encodes the comparison of Y with T. Then, during the alignment stage, for each comparison of a source Si with T, the precompiled data structure is used to speed up the part of Y. We show how to reduce the O(nℓ) alignment work, for each appearance of the common substring Y in a source string, to O(n)-at the cost of O(nℓ) encoding work, which is executed only once.

KW - Candidate lists

KW - Design and analysis of algorithms

KW - Dynamic programming

KW - Monge arrays

KW - Repeated substrings

KW - Sequence comparison

KW - Shared substrings

UR - http://www.scopus.com/inward/record.url?scp=0012526495&partnerID=8YFLogxK

U2 - 10.1006/jagm.2001.1191

DO - 10.1006/jagm.2001.1191

M3 - Article

AN - SCOPUS:0012526495

SN - 0196-6774

VL - 41

SP - 338

EP - 359

JO - Journal of Algorithms

JF - Journal of Algorithms

IS - 2

ER -