Abstract
Motivated by applications in polymer-based data storage, we study the problem of reconstructing a string from part of its composition multiset. We give a full description of strings that cannot be uniquely reconstructed up to reversal from their multisets of all the prefix-suffix compositions. Leveraging this description, we prove that for all n ? 6, there exists a string of length n that cannot be uniquely reconstructed up to reversal. Moreover, for all n ? 6, we explicitly construct the set consisting of all length n strings that can be uniquely reconstructed up to reversal. As a byproduct, we obtain that any binary string can be constructed using Dyck strings and Catalan-Bertrand strings. For any given string s, we provide a method to explicitly construct the set of all strings with the same prefix-suffix composition multiset as s, as well as a formula for the size of this set. Furthermore, we construct two classes of composition codes that can respectively correct composition missing errors and mass-reducing substitution errors. In addition, we raise a new problem: reconstructing a string when only given its compositions of substrings of length at most r. We give suitable codes under some conditions.
Original language | English |
---|---|
Pages (from-to) | 3922-3940 |
Number of pages | 19 |
Journal | IEEE Transactions on Information Theory |
Volume | 70 |
Issue number | 6 |
DOIs | |
State | Published - 1 Jun 2024 |
Keywords
- Dyck strings
- Polymer-based storage
- composition codes
- unique string reconstruction
ASJC Scopus subject areas
- Information Systems
- Computer Science Applications
- Library and Information Sciences