Polytomy refinement for the correction of dubious duplications in gene trees.

Manuel Lafond, Cedric Chauve, Riccardo Dondi, Nadia El-Mabrouk
Author Information
  1. Manuel Lafond: Department of Computer Science, Université de Montréal, Montréal, Quebec H3C 3J7, Canada, LaBRI, Université Bordeaux 1, Bordeaux, France, Department of Mathematics, Simon Fraser University, Burnaby (BC) V5A 1S6, Canada and Universitá degli Studi di Bergamo, Bergamo 24129 IT, Italy.
  2. Cedric Chauve: Department of Computer Science, Université de Montréal, Montréal, Quebec H3C 3J7, Canada, LaBRI, Université Bordeaux 1, Bordeaux, France, Department of Mathematics, Simon Fraser University, Burnaby (BC) V5A 1S6, Canada and Universitá degli Studi di Bergamo, Bergamo 24129 IT, Italy Department of Computer Science, Université de Montréal, Montréal, Quebec H3C 3J7, Canada, LaBRI, Université Bordeaux 1, Bordeaux, France, Department of Mathematics, Simon Fraser University, Burnaby (BC) V5A 1S6, Canada and Universitá degli Studi di Bergamo, Bergamo 24129 IT, Italy.
  3. Riccardo Dondi: Department of Computer Science, Université de Montréal, Montréal, Quebec H3C 3J7, Canada, LaBRI, Université Bordeaux 1, Bordeaux, France, Department of Mathematics, Simon Fraser University, Burnaby (BC) V5A 1S6, Canada and Universitá degli Studi di Bergamo, Bergamo 24129 IT, Italy.
  4. Nadia El-Mabrouk: Department of Computer Science, Université de Montréal, Montréal, Quebec H3C 3J7, Canada, LaBRI, Université Bordeaux 1, Bordeaux, France, Department of Mathematics, Simon Fraser University, Burnaby (BC) V5A 1S6, Canada and Universitá degli Studi di Bergamo, Bergamo 24129 IT, Italy.

Abstract

MOTIVATION: Large-scale methods for inferring gene trees are error-prone. Correcting gene trees for weakly supported features often results in non-binary trees, i.e. trees with polytomies, thus raising the natural question of refining such polytomies into binary trees. A feature pointing toward potential errors in gene trees are duplications that are not supported by the presence of multiple gene copies.
RESULTS: We introduce the problem of refining polytomies in a gene tree while minimizing the number of created non-apparent duplications in the resulting tree. We show that this problem can be described as a graph-theoretical optimization problem. We provide a bounded heuristic with guaranteed optimality for well-characterized instances. We apply our algorithm to a set of ray-finned fish gene trees from the Ensembl database to illustrate its ability to correct dubious duplications.
AVAILABILITY AND IMPLEMENTATION: The C++ source code for the algorithms and simulations described in the article are available at http://www-ens.iro.umontreal.ca/~lafonman/software.php.
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

References

  1. Mol Biol Evol. 2011 Jan;28(1):273-90 [PMID: 20660489]
  2. BMC Bioinformatics. 2010 Jun 09;11:312 [PMID: 20534164]
  3. Genome Res. 2009 Feb;19(2):327-35 [PMID: 19029536]
  4. Algorithms Mol Biol. 2012 Nov 20;7(1):31 [PMID: 23167951]
  5. Nucleic Acids Res. 2013 Jan;41(Database issue):D377-86 [PMID: 23193289]
  6. J Comput Biol. 2006 Mar;13(2):320-35 [PMID: 16597243]
  7. J Math Biol. 2013 Jan;66(1-2):399-420 [PMID: 22456957]
  8. Nucleic Acids Res. 2014 Jan;42(Database issue):D922-5 [PMID: 24194607]
  9. Nucleic Acids Res. 2012 Jan;40(Database issue):D84-90 [PMID: 22086963]
  10. BMC Bioinformatics. 2012 Jun 25;13 Suppl 10:S11 [PMID: 22759416]
  11. Bioinformatics. 2007 Jul 1;23(13):i549-58 [PMID: 17646342]
  12. Bioinformatics. 2007 Nov 1;23(21):2947-8 [PMID: 17846036]
  13. Proc Natl Acad Sci U S A. 2009 Apr 7;106(14):5714-9 [PMID: 19299507]
  14. PLoS Genet. 2007 Nov;3(11):e197 [PMID: 17997610]
  15. Syst Biol. 2013 Jan 1;62(1):110-20 [PMID: 22949484]
  16. J Comput Biol. 2000;7(3-4):429-47 [PMID: 11108472]
  17. Mol Biol Evol. 1987 Jul;4(4):406-25 [PMID: 3447015]
  18. Syst Biol. 2013 Nov;62(6):901-12 [PMID: 23925510]
  19. BMC Bioinformatics. 2012 Jun 25;13 Suppl 10:S14 [PMID: 22759419]
  20. Nucleic Acids Res. 2009 Jul;37(Web Server issue):W84-9 [PMID: 19435885]
  21. Syst Biol. 2003 Oct;52(5):696-704 [PMID: 14530136]
  22. J Comput Biol. 2008 Oct;15(8):981-1006 [PMID: 18808330]
  23. Bioinformatics. 2003 Aug 12;19(12):1572-4 [PMID: 12912839]
  24. Nucleic Acids Res. 2011 Jan;39(Database issue):D556-60 [PMID: 21075798]
  25. J Mol Evol. 2006 Aug;63(2):240-50 [PMID: 16830091]
  26. Algorithms Mol Biol. 2013 Apr 08;8(1):12 [PMID: 23566548]
  27. BMC Bioinformatics. 2013;14 Suppl 15:S5 [PMID: 24564227]
  28. Genome Res. 2013 Feb;23(2):323-30 [PMID: 23132911]
  29. BMC Evol Biol. 2006 Feb 11;6:15 [PMID: 16472400]

MeSH Term

Algorithms
Animals
Fishes
Genes
Phylogeny

Word Cloud

Created with Highcharts 10.0.0treesgeneduplicationspolytomiesproblemsupportedrefiningtreedescribeddubiousavailableMOTIVATION:Large-scalemethodsinferringerror-proneCorrectingweaklyfeaturesoftenresultsnon-binaryiethusraisingnaturalquestionbinaryfeaturepointingtowardpotentialerrorspresencemultiplecopiesRESULTS:introduceminimizingnumbercreatednon-apparentresultingshowcangraph-theoreticaloptimizationprovideboundedheuristicguaranteedoptimalitywell-characterizedinstancesapplyalgorithmsetray-finnedfishEnsembldatabaseillustrateabilitycorrectAVAILABILITYANDIMPLEMENTATION:C++sourcecodealgorithmssimulationsarticlehttp://www-ensiroumontrealca/~lafonman/softwarephpSUPPLEMENTARYINFORMATION:SupplementarydataBioinformaticsonlinePolytomyrefinementcorrection

Similar Articles

Cited By