The Entropy Rate of Some Pólya String Models

Ohad Elishco, Farzad Farnoud Hassanzadeh, Moshe Schwartz, Jehoshua Bruck

Research output: Contribution to journalArticlepeer-review

Abstract

We study random string-duplication systems, which we call Pólya string models. These are motivated by a class of mutations that are common in most organisms and lead to an abundance of repeated sequences in their genomes. Unlike previous works that study the combinatorial capacity of string-duplication systems, or in a probabilistic setting, various string statistics, this work provides the exact entropy rate or bounds on it, for several probabilistic models. The entropy rate determines the compressibility of the resulting sequences, as well as quantifying the amount of sequence diversity that these mutations can create. In particular, we study the entropy rate of noisy string-duplication systems, including the tandem-duplication, end-duplication, and interspersed-duplication systems, where in all cases we study duplication of length 1 only. Interesting connections are drawn between some systems and the signature of random permutations, as well as to the beta distribution common in population genetics.

Original languageEnglish
Article number8809682
Pages (from-to)8180-8193
Number of pages14
JournalIEEE Transactions on Information Theory
Volume65
Issue number12
DOIs
StatePublished - 1 Dec 2019

Keywords

  • DNA storage
  • Pólya string models
  • entropy rate
  • string-duplication systems

Fingerprint

Dive into the research topics of 'The Entropy Rate of Some Pólya String Models'. Together they form a unique fingerprint.

Cite this