Mismatch sampling

Raphaël Clifford, Klim Efremenko, Benny Porat, Ely Porat, Amir Rothschild

Research output: Contribution to journalConference articlepeer-review

Abstract

We consider the well known problem of pattern matching under the Hamming distance. Previous approaches have shown how to count the number of mismatches efficiently, especially when a bound is known for the maximum Hamming distance. Our interest is different in that we wish collect a random sample of mismatches of fixed size at each position in the text. Given a pattern p of length m and a text t of length n, we show how to sample with high probability c mismatches where possible from every alignment of p and t in O((c∈+∈logn) (n∈+∈mlogm)logm) time. Further, we guarantee that the mismatches are sampled uniformly and can therefore be seen as representative of the types of mismatches that occur.

Original languageEnglish
Pages (from-to)99-108
Number of pages10
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5280 LNCS
DOIs
StatePublished - 1 Jan 2008
Externally publishedYes
Event15th International Symposium on String Processing and Information Retrieval, SPIRE 2008 - Melbourne. VIC, Australia
Duration: 10 Nov 200812 Nov 2008

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Mismatch sampling'. Together they form a unique fingerprint.

Cite this