Evolution of k -Mer Frequencies and Entropy in Duplication and Substitution Mutation Systems

Hao Lou, Moshe Schwartz, Jehoshua Bruck, Farzad Farnoud

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

Genomic evolution can be viewed as string-editing processes driven by mutations. An understanding of the statistical properties resulting from these mutation processes is of value in a variety of tasks related to biological sequence data, e.g., estimation of model parameters and compression. At the same time, due to the complexity of these processes, designing tractable stochastic models and analyzing them are challenging. In this paper, we study two kinds of systems, each representing a set of mutations. In the first system, tandem duplications and substitution mutations are allowed and in the other, interspersed duplications. We provide stochastic models and, via stochastic approximation, study the evolution of substring frequencies for these two systems separately. Specifically, we show that $k$ -mer frequencies converge almost surely and determine the limit set. Furthermore, we present a method for finding upper bounds on entropy for such systems.

Original languageEnglish
Article number8864099
Pages (from-to)3171-3186
Number of pages16
JournalIEEE Transactions on Information Theory
Volume66
Issue number5
DOIs
StatePublished - 1 May 2020

Keywords

  • String-duplication systems
  • entropy
  • substitution mutation

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Evolution of k -Mer Frequencies and Entropy in Duplication and Substitution Mutation Systems'. Together they form a unique fingerprint.

Cite this