Optimized Sequence Library Design for Efficient In Vitro Interaction Mapping

Yaron Orenstein, Robert Puccinelli, Ryan Kim, Polly Fordyce, Bonnie Berger

Research output: Contribution to journalArticlepeer-review


Sequence libraries that cover all k-mers enable universal, unbiased measurements of binding to both oligonucleotides and peptides. While the number of k-mers grows exponentially in k, space on all experimental platforms is limited. Here, we shrink k-mer library sizes by using joker characters, which represent all characters in the alphabet simultaneously. We present the JokerCAKE (joker covering all k-mers) algorithm for generating a short sequence such that each k-mer appears at least p times with at most one joker character per k-mer. By running our algorithm on a range of parameters and alphabets, we show that JokerCAKE produces near-optimal sequences. Moreover, through comparison with data from hundreds of DNA-protein binding experiments and with new experimental results for both standard and JokerCAKE libraries, we establish that accurate binding scores can be inferred for high-affinity k-mers using JokerCAKE libraries. JokerCAKE libraries allow researchers to search a significantly larger sequence space using the same number of experimental measurements and at the same cost. We present a new compact sequence design that covers all k-mers utilizing joker characters and develop an efficient algorithm to generate such designs. We show through simulations and experimental validation that these sequence designs are useful for identifying high-affinity binding sites at significantly reduced cost and space.

Original languageEnglish
Pages (from-to)230-236.e5
JournalCell Systems
Issue number3
StatePublished - 27 Sep 2017
Externally publishedYes


  • de Bruijn graph
  • microarray design
  • sequence libraries

ASJC Scopus subject areas

  • Pathology and Forensic Medicine
  • Histology
  • Cell Biology


Dive into the research topics of 'Optimized Sequence Library Design for Efficient In Vitro Interaction Mapping'. Together they form a unique fingerprint.

Cite this