## Abstract

We reconsider the well-known problem of pattern matching under the Hamming distance. Previous approaches have shown how to count the number of mismatches efficiently, especially when a bound is known for the maximum Hamming distance. Our interest is different in that we wish to collect a random sample of mismatches of fixed size at each position in the text. Given a pattern p of length m and a text t of length n, we show how to sample with high probability up to c mismatches from every alignment of p and t in O((c+logn)(n+mlogm)logm) time. Further, we guarantee that the mismatches are sampled uniformly and can therefore be seen as representative of the types of mismatches that occur.

Original language | English |
---|---|

Pages (from-to) | 112-118 |

Number of pages | 7 |

Journal | Information and Computation |

Volume | 214 |

DOIs | |

State | Published - 1 May 2012 |

Externally published | Yes |

## ASJC Scopus subject areas

- Theoretical Computer Science
- Information Systems
- Computer Science Applications
- Computational Theory and Mathematics