Datasets of positive and negative miRNA-target interactions



MicroRNAs (miRNAs) are small non-coding RNAs that regulate gene expression post-transcriptionally via base-pairing with complementary sequences on messenger RNAs (mRNAs). Computational approaches that predict miRNA target interactions (MTIs) facilitate the process of narrowing down potential targets for experimental validation. The availability of new datasets of high-throughput, direct MTIs has led to the development of machine learning (ML) based methods for MTI prediction. To train an ML algorithm, there is a need to supply entries from all class labels (i.e., positive and negative). Currently, no high-throughput assays exist for capturing negative examples, hindering effective classifier construction. Therefore, current ML approaches must rely on artificially generated negative examples for training. Moreover, the lack of uniform standards for generating such data leads to biased results and hampers comparisons between studies. We investigated the impact of different methods to generate negative data on the classification of true MTIs. The study relies on training ML models on a fixed positive dataset in combination with different negative datasets and evaluating their intra- and cross-dataset performance. As a result, we were able to examine each method independently and evaluate ML models’ sensitivity to the methodologies utilized in negative data generation. This data include all the negative datasets that generated by the different methods and the positive data that was used.
Date made available2023

Cite this