Abstract
G-quadruplexes are nucleic acid secondary structures that form within guanine-rich DNA or RNA sequences. G-quadruplex formation can affect chromatin architecture and gene regulation and has been associated with genomic instability, genetic diseases and cancer progression. G-quadruplex formation in a DNA template can be assessed using polymerase stop assays, which measure polymerase stalling at G-quadruplex sites. An experimental technique, called G4-seq, was developed by combining features of the polymerase stop assay with Illumina next-generation sequencing. The experimental data produced by this technique provides unprecedented details on where and at what intensity do G-quadruplexes form in the human genome. Still, running the experimental protocol on a whole genome is an expensive and time-consuming process. Thus, it is highly desirable to have a computational method to predict G-quadruplex formation of new DNA sequences or whole genomes. Here, we present a new method, called G4detector, to predict G-quadruplexes from DNA sequences based on multi-kernel convolutional neural networks. To test G4detector, we compiled novel high-throughput in vitro and in vivo benchmarks. On these data, we show that G4detector outperforms extant methods for the same task on all benchmark datasets. We visualize the most important features of G4detector models and discover that G-quadruplex formation is highly depended on G-tracts length, their spacing and nucleotide composition between them. The code and benchmarks are publicly available on github.com/OrensteinLab/G4detector.
| Original language | English |
|---|---|
| Title of host publication | ACM-BCB 2019 - Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics |
| Publisher | Association for Computing Machinery, Inc |
| Pages | 357-365 |
| Number of pages | 9 |
| ISBN (Electronic) | 9781450366663 |
| DOIs | |
| State | Published - 4 Sep 2019 |
| Event | 10th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2019 - Niagara Falls, United States Duration: 7 Sep 2019 → 10 Sep 2019 |
Publication series
| Name | ACM-BCB 2019 - Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics |
|---|
Conference
| Conference | 10th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2019 |
|---|---|
| Country/Territory | United States |
| City | Niagara Falls |
| Period | 7/09/19 → 10/09/19 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- Convolutional neural networks
- G-quadruplex
ASJC Scopus subject areas
- Computer Science Applications
- Software
- Biomedical Engineering
- Health Informatics
Fingerprint
Dive into the research topics of 'Predicting G-quadruplexes from DNA sequences using multi-kernel convolutional neural networks'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver