TY - GEN
T1 - New Algorithms for Structure Informed Genome Rearrangement
AU - Ozery, Eden
AU - Zehavi, Meirav
AU - Ziv-Ukelson, Michal
N1 - Funding Information:
Supplementary Material The code for our tool, the data used in experiments, and the log file produced by the run of the reported benchmark, can be found on GitHub. Software (Source Code and Data): https://github.com/edenozery/MEM-Rearrange Funding The research was supported by the Israel Science Foundation (ISF) grants no. 939/18 and 1176/18, and by the Frankel Center for Computer Science at Ben Gurion University.
Publisher Copyright:
© Eden Ozery, Meirav Zehavi, and Michal Ziv-Ukelson.
PY - 2022/9/1
Y1 - 2022/9/1
N2 - We define two new computational problems in the domain of perfect genome rearrangements, and propose three algorithms to solve them. The rearrangement scenarios modeled by the problems consider Reversal and Block Interchange operations, and a PQ-tree is utilized to guide the allowed operations and to compute their weights. In the first problem, Constrained TreeToString Divergence (CTTSD), we define the basic structure-informed rearrangement based divergence measure. Here, we assume that the gene order members of the gene cluster from which the PQ-tree is constructed are permutations. The PQ-tree representing the gene cluster is ordered such that the series of gene IDs spelled by its leaves is equivalent to the reference gene order. Then, a structure-informed gene rearrangement measure is computed between the ordered PQ-tree and the target gene order. The second problem, TreeToString Divergence (TTSD), generalizes CTTSD, where the gene order members are not necessarily permutations and the structure-informed rearrangement based divergence measure is extended to also consider up to dS and dT gene insertion and deletion operations, respectively, when modelling the PQ-tree informed divergence process from the reference order to the target order. The first algorithm solves CTTSD in O(nγ2 · (mp · 1.381γ + mq)) time and O(n2) space, where γ is the maximum number of children of a node, n is the length of the string and the number of leaves in the tree, and mp and mq are the number of P-nodes and Q-nodes in the tree, respectively. If one of the penalties of CTTSD is 0, then the algorithm runs in O(nmγ2) time and O(n2) space. The second algorithm solves TTSD in O(n2γ2dT2dS2m2(mp · 5γγ + mq)) time and O(dTdSm(mn + 5γ)) space, where γ is the maximum number of children of a node, n is the length of the string, m is the number of leaves in the tree, mp and mq are the number of P-nodes and Q-nodes in the tree, respectively, and allowing dT deletions from the tree and dS deletions from the string. The third algorithm is intended to reduce the space complexity of the second algorithm. It solves a variant of the problem (where one of the penalties of TTSD is 0) in O(nγ2dT2dS2m2(mp · 4γγ2n(dT + dS + m + n) + mq)) time and O(γ2nm2dTdS(dT + dS + m + n)) space. The algorithm is implemented as a software tool, denoted MEM-Rearrange, and applied to the comparative and evolutionary analysis of 59 chromosomal gene clusters extracted from a dataset of 1, 487 prokaryotic genomes.
AB - We define two new computational problems in the domain of perfect genome rearrangements, and propose three algorithms to solve them. The rearrangement scenarios modeled by the problems consider Reversal and Block Interchange operations, and a PQ-tree is utilized to guide the allowed operations and to compute their weights. In the first problem, Constrained TreeToString Divergence (CTTSD), we define the basic structure-informed rearrangement based divergence measure. Here, we assume that the gene order members of the gene cluster from which the PQ-tree is constructed are permutations. The PQ-tree representing the gene cluster is ordered such that the series of gene IDs spelled by its leaves is equivalent to the reference gene order. Then, a structure-informed gene rearrangement measure is computed between the ordered PQ-tree and the target gene order. The second problem, TreeToString Divergence (TTSD), generalizes CTTSD, where the gene order members are not necessarily permutations and the structure-informed rearrangement based divergence measure is extended to also consider up to dS and dT gene insertion and deletion operations, respectively, when modelling the PQ-tree informed divergence process from the reference order to the target order. The first algorithm solves CTTSD in O(nγ2 · (mp · 1.381γ + mq)) time and O(n2) space, where γ is the maximum number of children of a node, n is the length of the string and the number of leaves in the tree, and mp and mq are the number of P-nodes and Q-nodes in the tree, respectively. If one of the penalties of CTTSD is 0, then the algorithm runs in O(nmγ2) time and O(n2) space. The second algorithm solves TTSD in O(n2γ2dT2dS2m2(mp · 5γγ + mq)) time and O(dTdSm(mn + 5γ)) space, where γ is the maximum number of children of a node, n is the length of the string, m is the number of leaves in the tree, mp and mq are the number of P-nodes and Q-nodes in the tree, respectively, and allowing dT deletions from the tree and dS deletions from the string. The third algorithm is intended to reduce the space complexity of the second algorithm. It solves a variant of the problem (where one of the penalties of TTSD is 0) in O(nγ2dT2dS2m2(mp · 4γγ2n(dT + dS + m + n) + mq)) time and O(γ2nm2dTdS(dT + dS + m + n)) space. The algorithm is implemented as a software tool, denoted MEM-Rearrange, and applied to the comparative and evolutionary analysis of 59 chromosomal gene clusters extracted from a dataset of 1, 487 prokaryotic genomes.
KW - Breakpoint Distance
KW - Gene Cluster
KW - PQ-tree
UR - http://www.scopus.com/inward/record.url?scp=85137788689&partnerID=8YFLogxK
U2 - 10.4230/LIPIcs.WABI.2022.11
DO - 10.4230/LIPIcs.WABI.2022.11
M3 - Conference contribution
AN - SCOPUS:85137788689
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - 22nd International Workshop on Algorithms in Bioinformatics, WABI 2022
A2 - Boucher, Christina
A2 - Rahmann, Sven
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 22nd International Workshop on Algorithms in Bioinformatics, WABI 2022
Y2 - 5 September 2022 through 7 September 2022
ER -