TY - GEN
T1 - Stochastic sampling of structural contexts improves the scalability and accuracy of rna 3d module identification
AU - Sarrazin-Gendron, Roman
AU - Yao, Hua Ting
AU - Reinharz, Vladimir
AU - Oliver, Carlos G.
AU - Ponty, Yann
AU - Waldispühl, Jérôme
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2020.
PY - 2020/1/1
Y1 - 2020/1/1
N2 - RNA structures possess multiple levels of structural organization. Secondary structures are made of canonical (i.e. Watson-Crick and Wobble) helices, connected by loops whose local conformations are critical determinants of global 3D architectures. Such local 3D structures consist of conserved sets of non-canonical base pairs, called RNA modules. Their prediction from sequence data is thus a milestone toward 3D structure modelling. Unfortunately, the computational efficiency and scope of the current 3D module identification methods are too limited yet to benefit from all the knowledge accumulated in modules databases. Here, we introduce BayesPairing 2, a new sequence search algorithm leveraging secondary structure tree decomposition which allows to reduce the computational complexity and improve predictions on new sequences. We benchmarked our methods on 75 modules and 6380 RNA sequences, and report accuracies that are comparable to the state of the art, with considerable running time improvements. When identifying 200 modules on a single sequence, BayesPairing 2 is over 100 times faster than its previous version, opening new doors for genome-wide applications.
AB - RNA structures possess multiple levels of structural organization. Secondary structures are made of canonical (i.e. Watson-Crick and Wobble) helices, connected by loops whose local conformations are critical determinants of global 3D architectures. Such local 3D structures consist of conserved sets of non-canonical base pairs, called RNA modules. Their prediction from sequence data is thus a milestone toward 3D structure modelling. Unfortunately, the computational efficiency and scope of the current 3D module identification methods are too limited yet to benefit from all the knowledge accumulated in modules databases. Here, we introduce BayesPairing 2, a new sequence search algorithm leveraging secondary structure tree decomposition which allows to reduce the computational complexity and improve predictions on new sequences. We benchmarked our methods on 75 modules and 6380 RNA sequences, and report accuracies that are comparable to the state of the art, with considerable running time improvements. When identifying 200 modules on a single sequence, BayesPairing 2 is over 100 times faster than its previous version, opening new doors for genome-wide applications.
KW - RNA 3D modules
KW - RNA modules identification in sequence
KW - RNA structure prediction
UR - http://www.scopus.com/inward/record.url?scp=85084267007&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-45257-5_12
DO - 10.1007/978-3-030-45257-5_12
M3 - Conference contribution
AN - SCOPUS:85084267007
SN - 9783030452568
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 186
EP - 201
BT - Research in Computational Molecular Biology - 24th Annual International Conference, RECOMB 2020, Proceedings
A2 - Schwartz, Russell
PB - Springer
T2 - 24th Annual Conference on Research in Computational Molecular Biology, RECOMB 2020
Y2 - 10 May 2020 through 13 May 2020
ER -