Abstract
In this paper we consider the problem of encoding data into repeat-free sequences in which sequences are imposed to contain any k -tuple at most once (for predefined k ). First, the capacity of the repeat-free constraint are calculated. Then, an efficient algorithm, which uses two bits of redundancy, is presented to encode length- n sequences for k=2+2log (n). This algorithm is then improved to support any value of k of the form k=alog (n) , for 1< a , while its redundancy is o(n). We also calculate the capacity of repeat-free sequences when combined with local constraints which are given by a constrained system, and the capacity of multi-dimensional repeat-free codes.
Original language | English |
---|---|
Article number | 9465135 |
Pages (from-to) | 5749-5764 |
Number of pages | 16 |
Journal | IEEE Transactions on Information Theory |
Volume | 67 |
Issue number | 9 |
DOIs | |
State | Published - 1 Sep 2021 |
Keywords
- DNA sequences
- Information theory
- capacity
- constrained coding
- encoder construction
- error-correcting codes
ASJC Scopus subject areas
- Information Systems
- Computer Science Applications
- Library and Information Sciences