Abstract
A method, system and non-transitory computer-readable storage medium for determining whether an unordered collection of overlapping substrings (called shingles) can be uniquely decoded into a consistent string. The method, system and medium are applicable to the fields of networking, data management, cryptography, genetic engineering and linguistics. Disclosed herein is a theoretic framework, an automata theoretic approach, and a time-optimal streaming algorithm for determining whether a string of characters over an alphabet can be uniquely decoded from its two (or more) character shingles. The present algorithm achieves an overall time complexity and space complexity. The method and system can be used to efficiently reconcile two data objects, files, strings or portions thereof.
Original language | English |
---|---|
Patent number | US2014222760 |
IPC | G06F 11/ 14 A I |
Priority date | 4/02/14 |
State | Published - 7 Aug 2014 |