TY - GEN
T1 - A study of accessible motifs and RNA folding complexity
AU - Wexler, Ydo
AU - Zilberstein, Chaya
AU - Ziv-Ukelson, Michal
PY - 2006/7/14
Y1 - 2006/7/14
N2 - mRNA molecules are folded in the cells and therefore many of their substrings may actually be inaccessible to protein and microRNA binding. The need to apply an accessability criterion to the task of genome-wide mRNA motif discovery raises the challenge of overcoming the core O(n3) factor imposed by the time complexity of the currently best known algorithms for RNA secondary structure prediction [24,25,43]. We speed up the dynamic programming algorithms that are standard for RNA folding prediction. Our new approach significantly reduces the computations without sacrificing the optimality of the results, yielding an expected time complexity of O(n2ψ(n)), where ψ(n) is shown to be constant on average under standard polymer folding models. Benchmark analysis confirms that in practice the runtime ratio between the previous approach and the new algorithm indeed grows linearly with increasing sequence size. The fast new RNA folding algorithm is utilized for genome-wide discovery of accessible cis-regulatory motifs in data sets of ribosomal densities and decay rates of S. cerevisiae genes and to the mining of exposed binding sites of tissue-specific microRNAs in A. Thailand. Further details, including additional figures and proofs to all lemmas, can be found at: http://www.cs.tau.ac.il/~raichaluz/ QuadraticRNAFold.pdf
AB - mRNA molecules are folded in the cells and therefore many of their substrings may actually be inaccessible to protein and microRNA binding. The need to apply an accessability criterion to the task of genome-wide mRNA motif discovery raises the challenge of overcoming the core O(n3) factor imposed by the time complexity of the currently best known algorithms for RNA secondary structure prediction [24,25,43]. We speed up the dynamic programming algorithms that are standard for RNA folding prediction. Our new approach significantly reduces the computations without sacrificing the optimality of the results, yielding an expected time complexity of O(n2ψ(n)), where ψ(n) is shown to be constant on average under standard polymer folding models. Benchmark analysis confirms that in practice the runtime ratio between the previous approach and the new algorithm indeed grows linearly with increasing sequence size. The fast new RNA folding algorithm is utilized for genome-wide discovery of accessible cis-regulatory motifs in data sets of ribosomal densities and decay rates of S. cerevisiae genes and to the mining of exposed binding sites of tissue-specific microRNAs in A. Thailand. Further details, including additional figures and proofs to all lemmas, can be found at: http://www.cs.tau.ac.il/~raichaluz/ QuadraticRNAFold.pdf
UR - http://www.scopus.com/inward/record.url?scp=33745798377&partnerID=8YFLogxK
U2 - 10.1007/11732990_40
DO - 10.1007/11732990_40
M3 - Conference contribution
AN - SCOPUS:33745798377
SN - 3540332952
SN - 9783540332954
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 473
EP - 487
BT - Research in Computational Molecular Biology - 10th Annual International Conference, RECOMB 2006, Proceedings
T2 - 10th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2006
Y2 - 2 April 2006 through 5 April 2006
ER -