On the Long-Term Behavior of k-tuples Frequencies in Mutation Systems

Research output: Contribution to journalArticlepeer-review

Abstract

In response to the evolving landscape of data storage, researchers have increasingly explored non-traditional platforms, with DNA-based storage emerging as a cutting-edge solution. Our work is motivated by the potential of in-vivo DNA storage, known for its capacity to store vast amounts of information efficiently and confidentially within an organism's native DNA. While promising, in-vivo DNA storage faces challenges, including susceptibility to errors introduced by mutations. One way to understand the long-term effect of such mutations on the stored information is to investigate the frequency of k-tuples after multiple mutations. Drawing inspiration from related works, we generalize results from the study of duplication systems, particularly focusing on the frequency (or proportion) of k-tuples. We provide a general method for the analysis of mutation systems through the construction of a specialized matrix, dubbed substitution matrix, and the identification of its eigenvectors. Specifically, we derive an expression for the expected frequency of k-tuples. In the context of duplication errors, we leverage existing results on the almost sure convergence of the frequency of k-tuples. This allows us to equate the expected frequency of k-tuples to the limiting frequency of k-tuples. In addition, we demonstrate the convergence in probability of the frequency of k-tuples under certain assumptions.

Original languageEnglish
JournalIEEE Transactions on Information Theory
DOIs
StateAccepted/In press - 1 Jan 2024

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'On the Long-Term Behavior of k-tuples Frequencies in Mutation Systems'. Together they form a unique fingerprint.

Cite this