The capacity of string-duplication systems

Farzad Farnoud, Moshe Schwartz, Jehoshua Bruck

Research output: Contribution to journalArticlepeer-review

27 Scopus citations


It is known that the majority of the human genome consists of duplicated sequences. Furthermore, it is believed that a significant part of the rest of the genome also originated from duplicated sequences and has mutated to its current form. In this paper, we investigate the possibility of constructing an exponentially large number of sequences from a short initial sequence using simple duplication rules, including those resembling genomic-duplication processes. In other words, our goal is to find the capacity, or the expressive power, of these string-duplication systems. Our results include exact capacities, and bounds on the capacities, of four fundamental string-duplication systems. The study of these fundamental biologically inspired systems is an important step toward modeling and analyzing more complex biological processes.

Original languageEnglish
Article number7347431
Pages (from-to)811-824
Number of pages14
JournalIEEE Transactions on Information Theory
Issue number2
StatePublished - 1 Feb 2016


  • Capacity
  • Constrained coding
  • DNA
  • Formal languages
  • String duplication

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences


Dive into the research topics of 'The capacity of string-duplication systems'. Together they form a unique fingerprint.

Cite this