## Abstract

Motivated by applications in polymer-based data storage, we study the problem of reconstructing a string from part of its composition multiset. We give a full description of strings that cannot be uniquely reconstructed up to reversal from their multisets of all the prefix-suffix compositions. Leveraging this description, we prove that for all <italic>n</italic> ⩾ 6, there exists a string of length <italic>n</italic> that cannot be uniquely reconstructed up to reversal. Moreover, for all <italic>n</italic> ⩾ 6, we explicitly construct the set consisting of all length <italic>n</italic> strings that can be uniquely reconstructed up to reversal. As a byproduct, we obtain that any binary string can be constructed using Dyck strings and Catalan-Bertrand strings. For any given string s, we provide a method to explicitly construct the set of all strings with the same prefix-suffix composition multiset as s, as well as a formula for the size of this set. Furthermore, we construct two classes of composition codes that can respectively correct composition missing errors and mass-reducing substitution errors. In addition, we raise a new problem: reconstructing a string when only given its compositions of substrings of length at most <italic>r</italic>. We give suitable codes under some conditions.

Original language | English |
---|---|

Pages (from-to) | 1 |

Number of pages | 1 |

Journal | IEEE Transactions on Information Theory |

DOIs | |

State | Accepted/In press - 1 Jan 2023 |

## Keywords

- Buffer storage
- Codes
- Dyck strings
- Media
- Memory
- Polymer-based storage
- Polymers
- Redundancy
- Symbols
- composition codes
- unique string reconstruction

## ASJC Scopus subject areas

- Information Systems
- Computer Science Applications
- Library and Information Sciences