TY - GEN
T1 - MultiRBP
T2 - 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2021
AU - Karin, Jonathan
AU - Michel, Hagai
AU - Orenstein, Yaron
N1 - Publisher Copyright:
© 2021 Owner/Author.
PY - 2021/1/18
Y1 - 2021/1/18
N2 - Protein-RNA binding plays vital roles in post-transcriptional gene regulation. High-throughput in vitro binding measurements were generated for more than 200 RNA-binding proteins, enabling the development of computational methods to predict binding to any RNA transcript of interest. In recent years, deep learning-based methods have been developed to predict RNA binding in vitro achieving state-of-the-art results. However, all methods train a single model per protein, under-utilizing the similarities in binding preferences shared by multiple RNA-binding proteins. In this work, we developed MultiRBP, a deep learning-based method to predict RNA binding of hundreds of proteins to a given RNA sequence. The innovation of MultiRBP is in its multi-task nature, i.e., predicting binding for hundreds of proteins at the same time. We trained MultiRBP on the RNAcompete dataset, the most comprehensive dataset of in vitro binding measurements. Our method outperformed extant methods in both in vitro and in vivo RNA-binding prediction. Our method achieved an average Pearson correlation of 0.692±0.17 for in vitro binding prediction, and a median AUROC of 0.668±0.09 for in vivo binding prediction. Moreover, by visualizing the learned binding preferences, MultiRBP provided more interpretable visualization than a single-task model. The code is publicly available at github.com/OrensteinLab/MultiRBP.
AB - Protein-RNA binding plays vital roles in post-transcriptional gene regulation. High-throughput in vitro binding measurements were generated for more than 200 RNA-binding proteins, enabling the development of computational methods to predict binding to any RNA transcript of interest. In recent years, deep learning-based methods have been developed to predict RNA binding in vitro achieving state-of-the-art results. However, all methods train a single model per protein, under-utilizing the similarities in binding preferences shared by multiple RNA-binding proteins. In this work, we developed MultiRBP, a deep learning-based method to predict RNA binding of hundreds of proteins to a given RNA sequence. The innovation of MultiRBP is in its multi-task nature, i.e., predicting binding for hundreds of proteins at the same time. We trained MultiRBP on the RNAcompete dataset, the most comprehensive dataset of in vitro binding measurements. Our method outperformed extant methods in both in vitro and in vivo RNA-binding prediction. Our method achieved an average Pearson correlation of 0.692±0.17 for in vitro binding prediction, and a median AUROC of 0.668±0.09 for in vivo binding prediction. Moreover, by visualizing the learned binding preferences, MultiRBP provided more interpretable visualization than a single-task model. The code is publicly available at github.com/OrensteinLab/MultiRBP.
KW - RNA-binding proteins
KW - RNAcompete
KW - deep learning
KW - eCLIP
KW - multi-task learning
KW - neural networks
KW - protein-RNA binding
UR - http://www.scopus.com/inward/record.url?scp=85112374648&partnerID=8YFLogxK
U2 - 10.1145/3459930.3469525
DO - 10.1145/3459930.3469525
M3 - Conference contribution
AN - SCOPUS:85112374648
T3 - Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2021
BT - Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2021
PB - Association for Computing Machinery, Inc
Y2 - 1 August 2021 through 4 August 2021
ER -