TY - JOUR
T1 - A Biclique Approach to Reference-Anchored Gene Blocks and Its Applications to Genomic Islands
AU - Benshahar, Arnon
AU - Chalifa-Caspi, Vered
AU - Hermelin, Danny
AU - Ziv-Ukelson, Michal
N1 - Funding Information:
The research of D.H. and A.B. was partially supported by the People Programme (Marie Curie Actions) of the European Union’s Seventh Framework Programme (FP7/2007–2013) under REA grant agreement number 631163.11, and by the Israel Science Foundation (grant No. 551145/). The research of M.Z.-U. and A.B. was partially supported by the Israel Science Foundation (grant No. 179/14.) and by the Frankel Center for Computer Science at Ben Gurion University.
Publisher Copyright:
© Copyright 2018, Mary Ann Liebert, Inc. 2018.
PY - 2018/2/1
Y1 - 2018/2/1
N2 - We formalize a new problem variant in gene-block discovery, denoted Reference-Anchored Gene Blocks (RAGB), given a query sequence Q of length n, representing the gene array of a DNA element, a window size bound d on the length of a substring of interest in Q, and a set of target gene sequences . Our objective is to identify gene blocks in that are centered in a subset q of co-localized genes from Q, and contain genomes from in which the corresponding orthologs of the genes from q are also co-localized. We cast RAGB as a variant of a (colored) biclique problem in bipartite graphs, and analyze its parameterized complexity, as well as the parameterized complexity of other related problems. We give an time algorithm for the uncolored variant of our biclique problem, where m is the number of areas of interest that are parsed from the target sequences, and n and d are as defined earlier. Our algorithm can be adapted to compute all maximal bicliques in the graph within the same time complexity, and to handle edge weights with a slight increase to its time complexity. For the colored version of the problem, our algorithm has a time complexity of . We implement the algorithm and exemplify its application to the data mining of proteobacterial gene blocks that are centered in predicted proteobacterial genomic islands, leading to the identification of putatively mobilized clusters of virulence, pathogenicity, and resistance genes.
AB - We formalize a new problem variant in gene-block discovery, denoted Reference-Anchored Gene Blocks (RAGB), given a query sequence Q of length n, representing the gene array of a DNA element, a window size bound d on the length of a substring of interest in Q, and a set of target gene sequences . Our objective is to identify gene blocks in that are centered in a subset q of co-localized genes from Q, and contain genomes from in which the corresponding orthologs of the genes from q are also co-localized. We cast RAGB as a variant of a (colored) biclique problem in bipartite graphs, and analyze its parameterized complexity, as well as the parameterized complexity of other related problems. We give an time algorithm for the uncolored variant of our biclique problem, where m is the number of areas of interest that are parsed from the target sequences, and n and d are as defined earlier. Our algorithm can be adapted to compute all maximal bicliques in the graph within the same time complexity, and to handle edge weights with a slight increase to its time complexity. For the colored version of the problem, our algorithm has a time complexity of . We implement the algorithm and exemplify its application to the data mining of proteobacterial gene blocks that are centered in predicted proteobacterial genomic islands, leading to the identification of putatively mobilized clusters of virulence, pathogenicity, and resistance genes.
KW - Bicliques
KW - Bipartite graphs
KW - Gene blocks
KW - Genomic islands
KW - Parameterized complexity
UR - http://www.scopus.com/inward/record.url?scp=85041741978&partnerID=8YFLogxK
U2 - 10.1089/cmb.2017.0108
DO - 10.1089/cmb.2017.0108
M3 - Article
AN - SCOPUS:85041741978
SN - 1066-5277
VL - 25
SP - 214
EP - 235
JO - Journal of Computational Biology
JF - Journal of Computational Biology
IS - 2
ER -