Polytomy refinement for the correction of dubious duplications in gene trees.
Manuel Lafond, Cedric Chauve, Riccardo Dondi, Nadia El-Mabrouk
Author Information
Manuel Lafond: Department of Computer Science, Université de Montréal, Montréal, Quebec H3C 3J7, Canada, LaBRI, Université Bordeaux 1, Bordeaux, France, Department of Mathematics, Simon Fraser University, Burnaby (BC) V5A 1S6, Canada and Universitá degli Studi di Bergamo, Bergamo 24129 IT, Italy.
Cedric Chauve: Department of Computer Science, Université de Montréal, Montréal, Quebec H3C 3J7, Canada, LaBRI, Université Bordeaux 1, Bordeaux, France, Department of Mathematics, Simon Fraser University, Burnaby (BC) V5A 1S6, Canada and Universitá degli Studi di Bergamo, Bergamo 24129 IT, Italy Department of Computer Science, Université de Montréal, Montréal, Quebec H3C 3J7, Canada, LaBRI, Université Bordeaux 1, Bordeaux, France, Department of Mathematics, Simon Fraser University, Burnaby (BC) V5A 1S6, Canada and Universitá degli Studi di Bergamo, Bergamo 24129 IT, Italy.
Riccardo Dondi: Department of Computer Science, Université de Montréal, Montréal, Quebec H3C 3J7, Canada, LaBRI, Université Bordeaux 1, Bordeaux, France, Department of Mathematics, Simon Fraser University, Burnaby (BC) V5A 1S6, Canada and Universitá degli Studi di Bergamo, Bergamo 24129 IT, Italy.
Nadia El-Mabrouk: Department of Computer Science, Université de Montréal, Montréal, Quebec H3C 3J7, Canada, LaBRI, Université Bordeaux 1, Bordeaux, France, Department of Mathematics, Simon Fraser University, Burnaby (BC) V5A 1S6, Canada and Universitá degli Studi di Bergamo, Bergamo 24129 IT, Italy.
MOTIVATION: Large-scale methods for inferring gene trees are error-prone. Correcting gene trees for weakly supported features often results in non-binary trees, i.e. trees with polytomies, thus raising the natural question of refining such polytomies into binary trees. A feature pointing toward potential errors in gene trees are duplications that are not supported by the presence of multiple gene copies. RESULTS: We introduce the problem of refining polytomies in a gene tree while minimizing the number of created non-apparent duplications in the resulting tree. We show that this problem can be described as a graph-theoretical optimization problem. We provide a bounded heuristic with guaranteed optimality for well-characterized instances. We apply our algorithm to a set of ray-finned fish gene trees from the Ensembl database to illustrate its ability to correct dubious duplications. AVAILABILITY AND IMPLEMENTATION: The C++ source code for the algorithms and simulations described in the article are available at http://www-ens.iro.umontreal.ca/~lafonman/software.php. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.