TY - GEN
T1 - Edit distance with duplications and contractions revisited
AU - Pinhas, Tamar
AU - Tsur, Dekel
AU - Zakov, Shay
AU - Ziv-Ukelson, Michal
PY - 2011/7/13
Y1 - 2011/7/13
N2 - In this paper, we propose three algorithms for the problem of string edit distance with duplication and contraction operations, which improve the time complexity of previous algorithms for this problem. These include a faster algorithm for the general case of the problem, and two improvements which apply under certain assumptions on the cost function. The general algorithm is based on fast min-plus multiplication of square matrices, and obtains the running time of O( |∑|n3 log3 log n/log2 n), where n is the length of the input strings and |∑| is the alphabet size. This algorithm is further accelerated, under some assumption on the cost function, to O( |∑|(n2 + nn′2 log3 log n′/log2 n′)), time, where n′ is the length of the run-length encoding of the input. Another improvement is based on a new fast matrix-vector min-plus multiplication under a certain discreteness assumption, and yields an O( |∑|n3/log2 n) time algorithm. Furthermore, this algorithm is online, in the sense that one of the strings may be given letter by letter. As part of this algorithm we present the currently fastest online algorithm for weighted CFG parsing for discrete weighted grammars. This result is useful on its own.
AB - In this paper, we propose three algorithms for the problem of string edit distance with duplication and contraction operations, which improve the time complexity of previous algorithms for this problem. These include a faster algorithm for the general case of the problem, and two improvements which apply under certain assumptions on the cost function. The general algorithm is based on fast min-plus multiplication of square matrices, and obtains the running time of O( |∑|n3 log3 log n/log2 n), where n is the length of the input strings and |∑| is the alphabet size. This algorithm is further accelerated, under some assumption on the cost function, to O( |∑|(n2 + nn′2 log3 log n′/log2 n′)), time, where n′ is the length of the run-length encoding of the input. Another improvement is based on a new fast matrix-vector min-plus multiplication under a certain discreteness assumption, and yields an O( |∑|n3/log2 n) time algorithm. Furthermore, this algorithm is online, in the sense that one of the strings may be given letter by letter. As part of this algorithm we present the currently fastest online algorithm for weighted CFG parsing for discrete weighted grammars. This result is useful on its own.
UR - http://www.scopus.com/inward/record.url?scp=79960101635&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-21458-5_37
DO - 10.1007/978-3-642-21458-5_37
M3 - Conference contribution
AN - SCOPUS:79960101635
SN - 9783642214578
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 441
EP - 454
BT - Combinatorial Pattern Matching - 22nd Annual Symposium, CPM 2011, Proceedings
T2 - 22nd Annual Symposium on Combinatorial Pattern Matching, CPM 2011
Y2 - 27 June 2011 through 29 June 2011
ER -