TY - JOUR
T1 - Constrained Gene Block Discovery and Its Application to Prokaryotic Genomes
AU - Engel, Jonathan
AU - Veksler-Lublinsky, Isana
AU - Ziv-Ukelson, Michal
N1 - Funding Information:
The authors thank David Bouhadana for his contribution during the early stages of this research. The authors also thank Dina Svetlitsky for sharing her biological knowledge and data. The research of J.E. and M.Z.-U. was partially funded by the Israel Science Foundation (Grant Nos. 179/14 and 939/18).
Publisher Copyright:
© Copyright 2019, Mary Ann Liebert, Inc., publishers 2019.
PY - 2019/7/1
Y1 - 2019/7/1
N2 - Recent advances in Next Generation Sequencing techniques, combined with global efforts to study infectious diseases, yield huge and rapidly-growing databases of microbial genomes. These big new data statistically empower genomic-context based approaches to functional analysis: the idea is that groups of genes that are clustered locally together across many genomes usually express protein products that interact in the same biological pathway (e.g., operons). The problem of finding such conserved "gene blocks" in a given genomic data has been studied extensively. In this work, we propose a new gene block discovery problem variant: find conserved gene blocks abiding by a user specification of biological functional constraints. We take advantage of the biological constraints to efficiently prune the search space. This is achieved by modeling the new problem as a special constrained variant of the well-studied "Closed Frequent Itemset Mining" problem, generalized here to handle item duplications. We exemplify the application of the tool we developed for this problem with two different case studies related to microbial ATP (adenosine triphosphate)-binding cassette (ABC) transporters.
AB - Recent advances in Next Generation Sequencing techniques, combined with global efforts to study infectious diseases, yield huge and rapidly-growing databases of microbial genomes. These big new data statistically empower genomic-context based approaches to functional analysis: the idea is that groups of genes that are clustered locally together across many genomes usually express protein products that interact in the same biological pathway (e.g., operons). The problem of finding such conserved "gene blocks" in a given genomic data has been studied extensively. In this work, we propose a new gene block discovery problem variant: find conserved gene blocks abiding by a user specification of biological functional constraints. We take advantage of the biological constraints to efficiently prune the search space. This is achieved by modeling the new problem as a special constrained variant of the well-studied "Closed Frequent Itemset Mining" problem, generalized here to handle item duplications. We exemplify the application of the tool we developed for this problem with two different case studies related to microbial ATP (adenosine triphosphate)-binding cassette (ABC) transporters.
KW - ABC transporters
KW - conserved gene blocks
KW - gene block discovery
KW - gene teams
KW - itemset mining
UR - http://www.scopus.com/inward/record.url?scp=85070215560&partnerID=8YFLogxK
U2 - 10.1089/cmb.2019.0096
DO - 10.1089/cmb.2019.0096
M3 - Article
AN - SCOPUS:85070215560
SN - 1066-5277
VL - 26
SP - 745
EP - 766
JO - Journal of Computational Biology
JF - Journal of Computational Biology
IS - 7
ER -