Abstract
Peptide arrays measure the binding intensity of a specific protein to thousands of amino acid peptides. By using peptides that cover all k-mers, a comprehensive picture of the binding spectrum is obtained. Researchers would like to measure binding to the longest k-mer possible but are constrained by the number of peptides that can fit into a single microarray. A key challenge is designing a minimum number of peptides that cover all k-mers. Here, we suggest a novel idea to reduce the length of the sequence covering all k-mers by utilizing a unique property of the peptide synthesis process. Since the synthesis can start from both ends of the peptide template, it is enough to cover each k-mer or its reverse and to use the same template twice: in forward and reverse. Then, the computational problem is to generate a minimum length sequence that for each k-mer either contains the k-mer or its reverse. In this study, we present a new algorithm, called ReverseCAKE, to generate such a sequence. ReverseCAKE runs in time linear in the output size and is guaranteed to produce a sequence that is longer by at most Θ(nlogn) characters compared with the optimum n. The obtained saving factor by ReverseCAKE approaches the theoretical lower bound as k increases. In addition, we formulated the problem as an integer linear program and empirically observed that the solutions obtained by ReverseCAKE are near-optimal. Through this work, we enable more effective design of peptide microarrays.
Original language | English |
---|---|
Pages (from-to) | 376-385 |
Number of pages | 10 |
Journal | Journal of Computational Biology |
Volume | 27 |
Issue number | 3 |
DOIs | |
State | Published - 1 Mar 2020 |
Keywords
- array design
- de Bruijn graph
- de Bruijn sequence
- peptide array
- reverse synthesis
ASJC Scopus subject areas
- Modeling and Simulation
- Molecular Biology
- Genetics
- Computational Mathematics
- Computational Theory and Mathematics