Abstract
Background: MicroRNAs (miRNAs) are small non-coding RNAs that regulate gene expression post-transcriptionally via base-pairing with complementary sequences on messenger RNAs (mRNAs). Due to the technical challenges involved in the application of high-throughput experimental methods, datasets of direct bona fide miRNA targets exist only for a few model organisms. Machine learning (ML)-based target prediction models were successfully trained and tested on some of these datasets. There is a need to further apply the trained models to organisms in which experimental training data are unavailable. However, it is largely unknown how the features of miRNA–target interactions evolve and whether some features have remained fixed during evolution, raising questions regarding the general, cross-species applicability of currently available ML methods. Results: We examined the evolution of miRNA–target interaction rules and used data science and ML approaches to investigate whether these rules are transferable between species. We analyzed eight datasets of direct miRNA–target interactions in four species (human, mouse, worm, cattle). Using ML classifiers, we achieved high accuracy for intra-dataset classification and found that the most influential features of all datasets overlap significantly. To explore the relationships between datasets, we measured the divergence of their miRNA seed sequences and evaluated the performance of cross-dataset classification. We found that both measures coincide with the evolutionary distance between the compared species. Conclusions: The transferability of miRNA–targeting rules between species depends on several factors, the most associated factors being the composition of seed families and evolutionary distance. Furthermore, our feature-importance results suggest that some miRNA–target features have evolved while others remained fixed during the evolution of the species. Our findings lay the foundation for the future development of target prediction tools that could be applied to “non-model” organisms for which minimal experimental data are available. Availability and implementation: The code is freely available at https://github.com/gbenor/TPVOD.
Original language | English |
---|---|
Article number | 264 |
Journal | BMC Bioinformatics |
Volume | 22 |
Issue number | 1 |
DOIs | |
State | Published - 1 Dec 2021 |
Keywords
- AGO-CLIP
- CLASH
- Chimeric miRNA–target interactions
- Cross-species prediction
- Machine learning
- Target prediction
- miRNA
ASJC Scopus subject areas
- Structural Biology
- Biochemistry
- Molecular Biology
- Computer Science Applications
- Applied Mathematics