TY - JOUR
T1 - Sparse LCS Common Substring Alignment
AU - Landau, Gad M.
AU - Schieber, Baruch
AU - Ziv-Ukelson, Michal
N1 - Funding Information:
1 Partially supported by NSF grant CCR-0104307, by the Israel Science Foundation grant 282/01, by the FIRST Foundation of the Israel Academy of Science and Humanities, and by IBM Faculty Partnership Award.
Funding Information:
2On Education Leave from the IBM T.J. Watson Research Center; partially supported by the Israel Science Foundation grant 282/01, and by the FIRST Foundation of the Israel Academy of Science and Humanities.
PY - 2003/12/31
Y1 - 2003/12/31
N2 - The "Common Substring Alignment" problem is defined as follows. The input consists of a set of strings S1, S2..., S c, with a common substring appearing at least once in each of them, and a target string T. The goal is to compute similarity of all strings S i with T, without computing the part of the common substring over and over again. In this paper we consider the Common Substring Alignment problem for the LCS (Longest Common Subsequence) similarity metric. Our algorithm gains its efficiency by exploiting the sparsity inherent to the LCS problem. Let Y be the common substring, n be the size of the compared sequences, Ly be the length of the LCS of T and Y, denoted |LCS[T, Y]|, and L be max{|LCS[T, Si]|}. Our algorithm consists of an O(nLy) time encoding stage that is executed once per common substring, and an O(L) time alignment stage that is executed once for each appearance of the common substring in each source string. The additional running time depends only on the length of the parts of the strings that are not in any common substring.
AB - The "Common Substring Alignment" problem is defined as follows. The input consists of a set of strings S1, S2..., S c, with a common substring appearing at least once in each of them, and a target string T. The goal is to compute similarity of all strings S i with T, without computing the part of the common substring over and over again. In this paper we consider the Common Substring Alignment problem for the LCS (Longest Common Subsequence) similarity metric. Our algorithm gains its efficiency by exploiting the sparsity inherent to the LCS problem. Let Y be the common substring, n be the size of the compared sequences, Ly be the length of the LCS of T and Y, denoted |LCS[T, Y]|, and L be max{|LCS[T, Si]|}. Our algorithm consists of an O(nLy) time encoding stage that is executed once per common substring, and an O(L) time alignment stage that is executed once for each appearance of the common substring in each source string. The additional running time depends only on the length of the parts of the strings that are not in any common substring.
KW - Algorithms
KW - Common Substring Alignment
KW - LCS
KW - Sparsity
UR - http://www.scopus.com/inward/record.url?scp=0242439634&partnerID=8YFLogxK
U2 - 10.1016/j.ipl.2003.09.006
DO - 10.1016/j.ipl.2003.09.006
M3 - Article
AN - SCOPUS:0242439634
SN - 0020-0190
VL - 88
SP - 259
EP - 270
JO - Information Processing Letters
JF - Information Processing Letters
IS - 6
ER -