Edit distance with duplications and contractions revisited

Tamar Pinhas, Dekel Tsur, Shay Zakov, Michal Ziv-Ukelson

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

In this paper, we propose three algorithms for the problem of string edit distance with duplication and contraction operations, which improve the time complexity of previous algorithms for this problem. These include a faster algorithm for the general case of the problem, and two improvements which apply under certain assumptions on the cost function. The general algorithm is based on fast min-plus multiplication of square matrices, and obtains the running time of O( |∑|n3 log3 log n/log2 n), where n is the length of the input strings and |∑| is the alphabet size. This algorithm is further accelerated, under some assumption on the cost function, to O( |∑|(n2 + nn′2 log3 log n′/log2 n′)), time, where n′ is the length of the run-length encoding of the input. Another improvement is based on a new fast matrix-vector min-plus multiplication under a certain discreteness assumption, and yields an O( |∑|n3/log2 n) time algorithm. Furthermore, this algorithm is online, in the sense that one of the strings may be given letter by letter. As part of this algorithm we present the currently fastest online algorithm for weighted CFG parsing for discrete weighted grammars. This result is useful on its own.

Original languageEnglish
Title of host publicationCombinatorial Pattern Matching - 22nd Annual Symposium, CPM 2011, Proceedings
Pages441-454
Number of pages14
DOIs
StatePublished - 13 Jul 2011
Event22nd Annual Symposium on Combinatorial Pattern Matching, CPM 2011 - Palermo, Italy
Duration: 27 Jun 201129 Jun 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6661 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference22nd Annual Symposium on Combinatorial Pattern Matching, CPM 2011
Country/TerritoryItaly
CityPalermo
Period27/06/1129/06/11

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Edit distance with duplications and contractions revisited'. Together they form a unique fingerprint.

Cite this