The capacity of string-duplication systems

Farzad Farnoud, Moshe Schwartz, Jehoshua Bruck

Research output: Contribution to journalArticlepeer-review

23 Scopus citations

Abstract

It is known that the majority of the human genome consists of duplicated sequences. Furthermore, it is believed that a significant part of the rest of the genome also originated from duplicated sequences and has mutated to its current form. In this paper, we investigate the possibility of constructing an exponentially large number of sequences from a short initial sequence using simple duplication rules, including those resembling genomic-duplication processes. In other words, our goal is to find the capacity, or the expressive power, of these string-duplication systems. Our results include exact capacities, and bounds on the capacities, of four fundamental string-duplication systems. The study of these fundamental biologically inspired systems is an important step toward modeling and analyzing more complex biological processes.

Original languageEnglish
Article number7347431
Pages (from-to)811-824
Number of pages14
JournalIEEE Transactions on Information Theory
Volume62
Issue number2
DOIs
StatePublished - 1 Feb 2016

Keywords

  • Capacity
  • Constrained coding
  • DNA
  • Formal languages
  • String duplication

Fingerprint

Dive into the research topics of 'The capacity of string-duplication systems'. Together they form a unique fingerprint.

Cite this