TY - JOUR
T1 - Matching algorithms for assigning orthologs after genome duplication events
AU - Fertin, Guillaume
AU - Hüffner, Falk
AU - Komusiewicz, Christian
AU - Sorge, Manuel
N1 - Funding Information:
MS was supported by the DFG, project DAPA (NI 369/12), the People Programme (Marie Curie Actions) of the European Union's Seventh Framework Programme (FP7/2007-2013) under REA grant agreement number 631163.11, and the Israel Science Foundation (grant no. 551145/14).
Publisher Copyright:
© 2018 Elsevier Ltd
PY - 2018/6/1
Y1 - 2018/6/1
N2 - In this paper, we introduce and analyze two graph-based models for assigning orthologs in the presence of whole-genome duplications, using similarity information between pairs of genes. The common feature of our two models is that genes of the first genome may be assigned two orthologs from the second genome, which has undergone a whole-genome duplication. Additionally, our models incorporate the new notion of duplication bonus, a parameter that reflects how assigning two orthologs to a given gene should be rewarded or penalized. Our work is mainly focused on developing exact and reasonably time-consuming algorithms for these two models: we show that the first one is polynomial-time solvable, while the second is NP-hard. For the latter, we thus design two fixed-parameter algorithms, i.e. exact algorithms whose running times are exponential only with respect to a small and well-chosen input parameter. Finally, for both models, we evaluate our algorithms on pairs of plant genomes. Our experiments show that the NP-hard model yields a better cluster quality at the cost of lower coverage, due to the fact that our instances cannot be completely solved by our algorithms. However, our results are altogether encouraging and show that our methods yield biologically significant predictions of orthologs when the duplication bonus value is properly chosen.
AB - In this paper, we introduce and analyze two graph-based models for assigning orthologs in the presence of whole-genome duplications, using similarity information between pairs of genes. The common feature of our two models is that genes of the first genome may be assigned two orthologs from the second genome, which has undergone a whole-genome duplication. Additionally, our models incorporate the new notion of duplication bonus, a parameter that reflects how assigning two orthologs to a given gene should be rewarded or penalized. Our work is mainly focused on developing exact and reasonably time-consuming algorithms for these two models: we show that the first one is polynomial-time solvable, while the second is NP-hard. For the latter, we thus design two fixed-parameter algorithms, i.e. exact algorithms whose running times are exponential only with respect to a small and well-chosen input parameter. Finally, for both models, we evaluate our algorithms on pairs of plant genomes. Our experiments show that the NP-hard model yields a better cluster quality at the cost of lower coverage, due to the fact that our instances cannot be completely solved by our algorithms. However, our results are altogether encouraging and show that our methods yield biologically significant predictions of orthologs when the duplication bonus value is properly chosen.
KW - Comparative genomics
KW - Graph algorithms
KW - NP-hard problem
KW - Plant genomics
KW - Synteny blocks
UR - http://www.scopus.com/inward/record.url?scp=85045089946&partnerID=8YFLogxK
U2 - 10.1016/j.compbiolchem.2018.03.015
DO - 10.1016/j.compbiolchem.2018.03.015
M3 - Article
C2 - 29650458
AN - SCOPUS:85045089946
VL - 74
SP - 379
EP - 390
JO - Computational Biology and Chemistry
JF - Computational Biology and Chemistry
SN - 1476-9271
ER -