Efficiently decoding strings from their shingles

Aryeh Kontorovich, Ari Trachtenberg

Research output: Working paper/PreprintPreprint

Abstract

Determining whether an unordered collection of overlapping substrings (called shingles) can be uniquely decoded into a consistent string is a problem that lies within the foundation of a broad assortment of disciplines ranging from networking and information theory through cryptography and even genetic engineering and linguistics. We present three perspectives on this problem: a graph theoretic framework due to Pevzner, an automata theoretic approach from our previous work, and a new insight that yields a time-optimal streaming algorithm for determining whether a string of n characters over the alphabet Σ can be uniquely decoded from its two-character shingles. Our algorithm achieves an overall time complexity Θ(n) and space complexity O(|Σ|). As an application, we demonstrate how this algorithm can be extended to larger shingles for efficient string reconciliation.
Original languageEnglish GB
StatePublished - 2012

Publication series

NameArxiv preprint

Fingerprint

Dive into the research topics of 'Efficiently decoding strings from their shingles'. Together they form a unique fingerprint.

Cite this