TY - GEN
T1 - RNA motif search using the Structure to String (STR2) method
AU - Bergig, Oriel
AU - Barash, Danny
AU - Kedem, Klara
PY - 2004/12/1
Y1 - 2004/12/1
N2 - We present a novel approach for detecting RNA shapes in given selected genes. Aside of the traditional sequence-based search methods such as BLAST and FASTA, there is a growing interest in detecting specific RNA secondary structure domains by using effective structure-based search methods such as the RNAMotif. Towards this end, we devise a new algorithm with ideas taken from computational geometry. The method, called Structure to String (STR2), was initially developed to detect structural motifs in the tertiary structure of proteins. It converts an RNA secondary structure into a shape representing string of characters that capture the various structural motifs. To transform an RNA secondary structure to a string of characters, we adopt an approach used in proteomics for generating a collection of fragments. We identify a library of fragments for use in RNA secondary structure where each fragment is represented by a character. A unique feature of our method is that the fragments represent the geometry of the transitions between the secondary structure elements, such as the curve of the transition between stems and loops. Consequently, we represent the secondary structures of the query and target sequences by their corresponding character string representation and seek shape similarities by applying string matching algorithms. For the RNA folding prediction we use mfold. The method is implemented efficiently using suffix trees and other economization procedures. We show examples of its applicability on aptamer domains that are functionally important and are well predicted by mfold before the conversion to strings.
AB - We present a novel approach for detecting RNA shapes in given selected genes. Aside of the traditional sequence-based search methods such as BLAST and FASTA, there is a growing interest in detecting specific RNA secondary structure domains by using effective structure-based search methods such as the RNAMotif. Towards this end, we devise a new algorithm with ideas taken from computational geometry. The method, called Structure to String (STR2), was initially developed to detect structural motifs in the tertiary structure of proteins. It converts an RNA secondary structure into a shape representing string of characters that capture the various structural motifs. To transform an RNA secondary structure to a string of characters, we adopt an approach used in proteomics for generating a collection of fragments. We identify a library of fragments for use in RNA secondary structure where each fragment is represented by a character. A unique feature of our method is that the fragments represent the geometry of the transitions between the secondary structure elements, such as the curve of the transition between stems and loops. Consequently, we represent the secondary structures of the query and target sequences by their corresponding character string representation and seek shape similarities by applying string matching algorithms. For the RNA folding prediction we use mfold. The method is implemented efficiently using suffix trees and other economization procedures. We show examples of its applicability on aptamer domains that are functionally important and are well predicted by mfold before the conversion to strings.
UR - http://www.scopus.com/inward/record.url?scp=14044275092&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:14044275092
SN - 0769521940
T3 - Proceedings - 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004
SP - 660
EP - 661
BT - Proceedings - 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004
T2 - Proceedings - 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004
Y2 - 16 August 2004 through 19 August 2004
ER -