Abstract
An important goal in microbial computational genomics is to identify crucial events in the evolution of a gene that severely alter the duplication, loss, and mobilization patterns of the gene within the genomes in which it disseminates. In this article, we formalize this microbiological goal as a new pattern-matching problem in the domain of gene tree and species tree reconciliation, denoted "Reconciliation-Scenario Altering Mutation (RSAM) Discovery."We propose an O(m⋅n⋅k) time algorithm to solve this new problem, where m and n are the number of vertices of the input gene tree and species tree, respectively, and k is a user-specified parameter that bounds from above the number of optimal solutions of interest. The algorithm first constructs a hypergraph representing the k highest scoring reconciliation scenarios between the given gene tree and species tree, and then interrogates this hypergraph for subtrees matching a prespecified RSAM pattern. Our algorithm is optimal in the sense that the number of hypernodes in the hypergraph can be lower bounded by ω(m⋅n⋅k). We implement the new algorithm as a tool, called RSAM-finder, and demonstrate its application to the identification of RSAMs in toxins and drug resistance elements across a data set spanning hundreds of species.
Original language | English |
---|---|
Pages (from-to) | 1561-1580 |
Number of pages | 20 |
Journal | Journal of Computational Biology |
Volume | 27 |
Issue number | 11 |
DOIs | |
State | Published - 1 Nov 2020 |
Keywords
- PHYLOGENETIC TREES
- algorithms
- dynamic programming
ASJC Scopus subject areas
- Modeling and Simulation
- Molecular Biology
- Genetics
- Computational Mathematics
- Computational Theory and Mathematics