Abstract
—In recent years, DNA has emerged as a potentially viable storage technology. DNA synthesis, which refers to the task of writing the data into DNA, is perhaps the most costly part of existing storage systems. Consequently, the high cost and low throughput limit the practical use of available DNA synthesis technologies. It has been found that the homopolymer run (i.e., the repetition of the same nucleotide) is a major factor affecting the synthesis and sequencing errors. Recently, [26] raised and studied the coding problem for efficient synthesis for DNA-based storage systems. Among other things, they studied the maximal code size under synthesis constraints. In [29], the authors studied the role of batch optimization in reducing the cost of large-scale DNA synthesis, for a given pool S of random quaternary strings of fixed length. This problem is related to the problem posedin [26] which can be viewed as the opposite side of the coin. Instead of seeking the largest code in which every codeword can be synthesized in a certain amount of time, they asked what is the average synthesis time of a randomly chosen string. Following the lead of [29], in this paper, we take a step forward towards the theoretical understanding of DNA synthesis, and study the homopolymer run of length k > 1. Specifically, we are given a set of DNA strands S, randomly drawn from a Markovian distribution modeling a general homopolymer run length constraint, that we wish to synthesize. For this problem, we derive asymptotically tight high probability lower and upper bounds on the cost of DNA synthesis, for any k > 1. Our bounds imply that, perhaps surprisingly, the periodic sequence ACGT is asymptotically optimal in the sense of achieving the smallest possible cost. Our main technical contribution is the representation of the DNA synthesis process as a certain constrained system, for which string techniques can be applied
Original language | English |
---|---|
Pages (from-to) | 6941-6955 |
Number of pages | 15 |
Journal | IEEE Transactions on Information Theory |
Volume | 69 |
Issue number | 11 |
DOIs | |
State | Published - 19 Jun 2023 |
Keywords
- DNA
- Information entropy
- biological information theory
- codes
- decoding
- encoding
ASJC Scopus subject areas
- Information Systems
- Computer Science Applications
- Library and Information Sciences