TY - GEN
T1 - A biclique approach to reference anchored gene blocks and its applications to pathogenicity Islands
AU - Benshahar, Arnon
AU - Chalifa-Caspi, Vered
AU - Hermelin, Danny
AU - Ziv-Ukelson, Michal
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2016.
PY - 2016/1/1
Y1 - 2016/1/1
N2 - We formalize a new problem variant in gene-block discovery, denoted Reference-Anchored Gene Blocks (RAGB). Given a query sequence Q of length n, representing the gene-array of a DNA element, a window size bound d on the length of a substring of interest in Q, and a set of target gene sequences T = {T1…Tc}. Our objective is to identify gene-blocks in T that are centered in a subset q of co-localized genes from Q, and contain genomes from T in which the corresponding orthologs of the genes from q are also co-localized. We cast RAGB as a variant of a (colored) biclique problem in bipartite graphs, and analyze its parameterized complexity, as well as the parameterized complexity of other related problems. We give an O(nm+2dnm/ lgm) time algorithm for the uncolored variant of our biclique problem, where m is the number of areas of interest that are parsed from the target sequences, and n and d are as defined above. Our algorithm can be adapted to compute all maximal bicliques in the graph within the same time complexity, and to handle edge-weights with a slight O(lg d) increase to its time complexity. For the colored version of the problem, our algorithm has a time complexity of O(2dnm). We implement the algorithm and exemplify its application to LEE, a well-known pathogenicity island from the e. coli genome harboring virulence genes. Our code and supplementary materials, including omitted proofs and figures, are available at https://www. cs.bgu.ac.il/∼negevcb/RAGB/.
AB - We formalize a new problem variant in gene-block discovery, denoted Reference-Anchored Gene Blocks (RAGB). Given a query sequence Q of length n, representing the gene-array of a DNA element, a window size bound d on the length of a substring of interest in Q, and a set of target gene sequences T = {T1…Tc}. Our objective is to identify gene-blocks in T that are centered in a subset q of co-localized genes from Q, and contain genomes from T in which the corresponding orthologs of the genes from q are also co-localized. We cast RAGB as a variant of a (colored) biclique problem in bipartite graphs, and analyze its parameterized complexity, as well as the parameterized complexity of other related problems. We give an O(nm+2dnm/ lgm) time algorithm for the uncolored variant of our biclique problem, where m is the number of areas of interest that are parsed from the target sequences, and n and d are as defined above. Our algorithm can be adapted to compute all maximal bicliques in the graph within the same time complexity, and to handle edge-weights with a slight O(lg d) increase to its time complexity. For the colored version of the problem, our algorithm has a time complexity of O(2dnm). We implement the algorithm and exemplify its application to LEE, a well-known pathogenicity island from the e. coli genome harboring virulence genes. Our code and supplementary materials, including omitted proofs and figures, are available at https://www. cs.bgu.ac.il/∼negevcb/RAGB/.
UR - http://www.scopus.com/inward/record.url?scp=84984971063&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-43681-4_2
DO - 10.1007/978-3-319-43681-4_2
M3 - Conference contribution
AN - SCOPUS:84984971063
SN - 9783319436807
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 14
EP - 26
BT - Algorithms in Bioinformatics - 16th International Workshop, WABI 2016, Proceedings
A2 - Frith, Martin
A2 - Pedersen, Christian Nørgaard Storm
PB - Springer Verlag
T2 - 16th International Workshop on Algorithms in Bioinformatics, WABI 2016
Y2 - 22 August 2016 through 24 August 2016
ER -