Improved Analysis of High-Throughput Sequencing Data Using Small Universal k-Mer Hitting Sets

Yaron Orenstein

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

High-throughput sequencing machines can read millions of DNA molecules in parallel in a short time and at a relatively low cost. As a consequence, researchers have access to databases with millions of genomic samples. Searching and analyzing these large amounts of data require efficient algorithms. Universal hitting sets are sets of words that must be present in any long enough string. Using small universal hitting sets, it is possible to increase the efficiency of many high-throughput sequencing data analyses. But, generating minimum-size universal hitting sets is a hard problem. In this chapter, we cover our algorithmic developments to produce compact universal hitting sets and some of their potential applications.

Original languageEnglish
Title of host publicationMethods in Molecular Biology
PublisherHumana Press Inc.
Pages95-105
Number of pages11
DOIs
StatePublished - 1 Jan 2021

Publication series

NameMethods in Molecular Biology
Volume2243
ISSN (Print)1064-3745
ISSN (Electronic)1940-6029

Keywords

  • Minimizers
  • Universal hitting sets
  • de Bruijn graph

ASJC Scopus subject areas

  • Molecular Biology
  • Genetics

Fingerprint

Dive into the research topics of 'Improved Analysis of High-Throughput Sequencing Data Using Small Universal k-Mer Hitting Sets'. Together they form a unique fingerprint.

Cite this