Abstract
In response to the evolving landscape of data storage, researchers have increasingly explored non-traditional platforms, with DNA-based storage emerging as a cutting-edge solution. Our work is motivated by the potential of in-vivo DNA storage, known for its capacity to store vast amounts of information efficiently and confidentially within an organism's native DNA. While promising, in-vivo DNA storage faces challenges, including susceptibility to errors introduced by mutations. One way to understand the long-term effect of such mutations on the stored information is to investigate the frequency of k-tuples after multiple mutations. Drawing inspiration from related works, we generalize results from the study of duplication systems, particularly focusing on the frequency (or proportion) of k-tuples. We provide a general method for the analysis of mutation systems through the construction of a specialized matrix, dubbed substitution matrix, and the identification of its eigenvectors. Specifically, we derive an expression for the expected frequency of k-tuples. In the context of duplication errors, we leverage existing results on the almost sure convergence of the frequency of k-tuples. This allows us to equate the expected frequency of k-tuples to the limiting frequency of k-tuples. In addition, we demonstrate the convergence in probability of the frequency of k-tuples under certain assumptions.
Original language | English |
---|---|
Journal | IEEE Transactions on Information Theory |
DOIs | |
State | Accepted/In press - 1 Jan 2024 |
ASJC Scopus subject areas
- Information Systems
- Computer Science Applications
- Library and Information Sciences