G4detector: Convolutional Neural Network to Predict DNA G-quadruplexes

Mira Barshai, Alice Aubert, Yaron Orenstein

Research output: Contribution to journalArticlepeer-review

11 Scopus citations

Abstract

G-quadruplexes (G4s) are nucleic acid secondary structures that form within guanine-rich DNA or RNA sequences. G4 formation can affect chromatin architecture and gene regulation and has been associated with genomic instability, genetic diseases and cancer progression. The experimental data produced by the G4-seq experiment provides unprecedented details on G4 formation in the genome. Still, running the experimental protocol on a whole genome is an expensive and time-consuming process. Thus, it is highly desirable to have a computational method to predict G4 formation of new DNA sequences or whole genomes. Here, we present G4detector, a new method to predict G4s from DNA sequences based on a convolutional neural network. On top of the sequence information, we improved prediction accuracy by combining RNA secondary structure information. To train and test G4detector, we compiled novel high-throughput benchmarks over multiple species genomes measured by the G4-seq protocol. We show that G4detector outperforms extant methods for the same task on all benchmark datasets and is able to extrapolate human-trained measurements to various non-human species. The code and benchmarks are publicly available on github.com/OrensteinLab/G4detector.

Original languageEnglish
Pages (from-to)1946-1955
Number of pages10
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume19
Issue number4
Early online date19 Apr 2021
DOIs
StatePublished - 1 Jul 2022

Keywords

  • Bioinformatics
  • G-quadruplexes
  • convolutional neural networks
  • deep learning

ASJC Scopus subject areas

  • Biotechnology
  • Genetics
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'G4detector: Convolutional Neural Network to Predict DNA G-quadruplexes'. Together they form a unique fingerprint.

Cite this