Using a VOM model for reconstructing potential coding regions in EST sequences

Armin Shmilovici, Irad Ben-Gal

Research output: Contribution to journalArticlepeer-review

15 Scopus citations

Abstract

This paper presents a method for annotating coding and noncoding DNA regions by using variable order Markov (VOM) models. A main advantage in using VOM models is that their order may vary for different sequences, depending on the sequences' statistics. As a result, VOM models are more flexible with respect to model parameterization and can be trained on relatively short sequences and on low-quality datasets, such as expressed sequence tags (ESTs). The paper presents a modified VOM model for detecting and correcting insertion and deletion sequencing errors that are commonly found in ESTs. In a series of experiments the proposed method is found to be robust to random errors in these sequences.

Original languageEnglish
Pages (from-to)49-69
Number of pages21
JournalComputational Statistics
Volume22
Issue number1
DOIs
StatePublished - 1 Apr 2007

Keywords

  • Coding and noncoding DNA
  • Context tree
  • Gene annotation
  • Sequencing error detection and correction
  • Variable order Markov model

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty
  • Computational Mathematics

Fingerprint

Dive into the research topics of 'Using a VOM model for reconstructing potential coding regions in EST sequences'. Together they form a unique fingerprint.

Cite this