Gene tree correction for reconciliation and species tree inference.

Krister M Swenson, Andrea Doroftei, Nadia El-Mabrouk
Author Information
  1. Krister M Swenson: Département d'Informatique et de Recherche Opérationnelle, Université de Montréal, CP 6128 succ Centre-Ville, Montréal, H3C 3J7, Québec, Canada. swensonk@iro.umontreal.ca.

Abstract


BACKGROUND: Reconciliation is the commonly used method for inferring the evolutionary scenario for a gene family. It consists in "embedding" inferred gene trees into a known species tree, revealing the evolution of the gene family by duplications and losses. When a species tree is not known, a natural algorithmic problem is to infer a species tree from a set of gene trees, such that the corresponding reconciliation minimizes the number of duplications and/or losses. The main drawback of reconciliation is that the inferred evolutionary scenario is strongly dependent on the considered gene trees, as few misplaced leaves may lead to a completely different history, with significantly more duplications and losses.
RESULTS: In this paper, we take advantage of certain gene trees' properties in order to preprocess them for reconciliation or species tree inference. We flag certain duplication vertices of a gene tree, the "non-apparent duplication" (NAD) vertices, as resulting from the misplacement of leaves. In the case of species tree inference, we develop a polynomial-time heuristic for removing the minimum number of species leading to a set of gene trees that exhibit no NAD vertices with respect to at least one species tree. In the case of reconciliation, we consider the optimization problem of removing the minimum number of leaves or species leading to a tree without any NAD vertex. We develop a polynomial-time algorithm that is exact for two special classes of gene trees, and show a good performance on simulated data sets in the general case.

References

  1. Bioinformatics. 1998;14(9):819-20 [PMID: 9918954]
  2. Mol Phylogenet Evol. 1996 Oct;6(2):189-213 [PMID: 8899723]
  3. Genome Biol. 2007;8(7):R141 [PMID: 17634151]
  4. PLoS Genet. 2007 Nov;3(11):e197 [PMID: 17997610]
  5. J Comput Biol. 1997 Summer;4(2):177-87 [PMID: 9228616]
  6. IEEE/ACM Trans Comput Biol Bioinform. 2011 Mar-Apr;8(2):517-35 [PMID: 21233529]
  7. J Comput Biol. 2000;7(3-4):429-47 [PMID: 11108472]
  8. IEEE/ACM Trans Comput Biol Bioinform. 2006 Oct-Dec;3(4):323-33 [PMID: 17085842]
  9. J Comput Biol. 2008 Oct;15(8):1043-62 [PMID: 18781833]
  10. J Comput Biol. 2006 Mar;13(2):320-35 [PMID: 16597243]
  11. Nature. 2001 Feb 15;409(6822):847-9 [PMID: 11237007]

Word Cloud

Created with Highcharts 10.0.0treegenespeciestreesreconciliationduplicationslossesnumberleavesinferenceverticesNADcaseevolutionaryscenariofamilyinferredknownproblemsetcertaindeveloppolynomial-timeremovingminimumleadingBACKGROUND:Reconciliationcommonlyusedmethodinferringconsists"embedding"revealingevolutionnaturalalgorithmicinfercorrespondingminimizesand/ormaindrawbackstronglydependentconsideredmisplacedmayleadcompletelydifferenthistorysignificantlyRESULTS:papertakeadvantagetrees'propertiesorderpreprocessflagduplication"non-apparentduplication"resultingmisplacementheuristicexhibitrespectleastoneconsideroptimizationwithoutvertexalgorithmexacttwospecialclassesshowgoodperformancesimulateddatasetsgeneralGenecorrection

Similar Articles

Cited By