A systematic evaluation of data processing and problem formulation of CRISPR off-target site prediction

Ofir Yaish, Maor Asif, Yaron Orenstein

Research output: Contribution to journalReview articlepeer-review

Abstract

CRISPR/Cas9 system is widely used in a broad range of gene-editing applications. While this editing technique is quite accurate in the target region, there may be many unplanned off-target sites (OTSs). Consequently, a plethora of computational methods have been developed to predict off-target cleavage sites given a guide RNA and a reference genome. However, these methods are based on small-scale datasets (only tens to hundreds of OTSs) produced by experimental techniques to detect OTSs with a low signal-to-noise ratio. Recently, CHANGE-seq, a new in vitro experimental technique to detect OTSs, was used to produce a dataset of unprecedented scale and quality (>200 000 OTS over 110 guide RNAs). In addition, the same study included in cellula GUIDE-seq experiments for 58 of the guide RNAs. Here, we fill the gap in previous computational methods by utilizing these data to systematically evaluate data processing and formulation of the CRISPR OTSs prediction problem. Our evaluations show that data transformation as a pre-processing phase is critical prior to model training. Moreover, we demonstrate the improvement gained by adding potential inactive OTSs to the training datasets. Furthermore, our results point to the importance of adding the number of mismatches between guide RNAs and their OTSs as a feature. Finally, we present predictive off-target in cellula models based on both in vitro and in cellula data and compare them to state-of-the-art methods in predicting true OTSs. Our conclusions will be instrumental in any future development of an off-target predictor based on high-throughput datasets.

Original languageEnglish
Article numberbbac157
JournalBriefings in Bioinformatics
Volume23
Issue number5
DOIs
StatePublished - 1 Sep 2022

Keywords

  • CHANGE-seq
  • CRISPR off-target
  • GUIDE-seq
  • machine learning
  • read count normalization

ASJC Scopus subject areas

  • Information Systems
  • Molecular Biology

Fingerprint

Dive into the research topics of 'A systematic evaluation of data processing and problem formulation of CRISPR off-target site prediction'. Together they form a unique fingerprint.

Cite this