Efficient exploration of the space of reconciled gene trees.

Gergely J Szöllõsi, Wojciech Rosikiewicz, Bastien Boussau, Eric Tannier, Vincent Daubin
Author Information
  1. Gergely J Szöllõsi: Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne;

Abstract

Gene trees record the combination of gene-level events, such as duplication, transfer and loss (DTL), and species-level events, such as speciation and extinction. Gene tree-species tree reconciliation methods model these processes by drawing gene trees into the species tree using a series of gene and species-level events. The reconstruction of gene trees based on sequence alone almost always involves choosing between statistically equivalent or weakly distinguishable relationships that could be much better resolved based on a putative species tree. To exploit this potential for accurate reconstruction of gene trees, the space of reconciled gene trees must be explored according to a joint model of sequence evolution and gene tree-species tree reconciliation. Here we present amalgamated likelihood estimation (ALE), a probabilistic approach to exhaustively explore all reconciled gene trees that can be amalgamated as a combination of clades observed in a sample of gene trees. We implement the ALE approach in the context of a reconciliation model (Szöllősi et al. 2013), which allows for the DTL of genes. We use ALE to efficiently approximate the sum of the joint likelihood over amalgamations and to find the reconciled gene tree that maximizes the joint likelihood among all such trees. We demonstrate using simulations that gene trees reconstructed using the joint likelihood are substantially more accurate than those reconstructed using sequence alone. Using realistic gene tree topologies, branch lengths, and alignment sizes, we demonstrate that ALE produces more accurate gene trees even if the model of sequence evolution is greatly simplified. Finally, examining 1099 gene families from 36 cyanobacterial genomes we find that joint likelihood-based inference results in a striking reduction in apparent phylogenetic discord, with respectively. 24%, 59%, and 46% reductions in the mean numbers of duplications, transfers, and losses per gene family. The open source implementation of ALE is available from https://github.com/ssolo/ALE.git.

References

  1. Syst Biol. 2007 Aug;56(4):564-77 [PMID: 17654362]
  2. BMC Evol Biol. 2008 Mar 04;8:77 [PMID: 18318893]
  3. Nature. 2011 Jan 6;469(7328):93-6 [PMID: 21170026]
  4. Proc Natl Acad Sci U S A. 2012 Oct 23;109(43):17513-8 [PMID: 23043116]
  5. Proc Natl Acad Sci U S A. 2007 Apr 3;104(14):5936-41 [PMID: 17392434]
  6. Syst Biol. 2013 May 1;62(3):386-97 [PMID: 23355531]
  7. Syst Biol. 2003 Oct;52(5):696-704 [PMID: 14530136]
  8. BMC Evol Biol. 2008 Sep 22;8:255 [PMID: 18808672]
  9. BMC Bioinformatics. 2006 Apr 04;7:188 [PMID: 16594991]
  10. Trends Ecol Evol. 2010 Apr;25(4):224-32 [PMID: 19880211]
  11. Mol Biol Evol. 2011 Nov;28(11):3019-32 [PMID: 21652613]
  12. Genome Res. 2013 Feb;23(2):323-30 [PMID: 23132911]
  13. ISME J. 2010 Jun;4(6):777-83 [PMID: 20200567]
  14. Syst Biol. 2013 Jul;62(4):501-11 [PMID: 23479066]
  15. Proc Natl Acad Sci U S A. 2009 Apr 7;106(14):5714-9 [PMID: 19299507]
  16. J Mol Evol. 1981;17(6):368-76 [PMID: 7288891]
  17. Genome Res. 2012 Apr;22(4):755-65 [PMID: 22271778]
  18. Bioinformatics. 2009 Sep 1;25(17):2286-8 [PMID: 19535536]
  19. Syst Biol. 2013 Jan 1;62(1):110-20 [PMID: 22949484]
  20. Syst Biol. 2012 Jan;61(1):1-11 [PMID: 21828081]
  21. Mol Biol Evol. 2008 Jul;25(7):1307-20 [PMID: 18367465]
  22. Syst Biol. 2007 Jun;56(3):504-14 [PMID: 17562474]
  23. Methods Mol Biol. 2012;856:29-51 [PMID: 22399454]
  24. BMC Bioinformatics. 2009 Jun 16;10 Suppl 6:S3 [PMID: 19534752]
  25. Nucleic Acids Res. 2004 Mar 19;32(5):1792-7 [PMID: 15034147]

MeSH Term

Classification
Computer Simulation
Cyanobacteria
Phylogeny
Reproducibility of Results
Sequence Analysis, DNA

Word Cloud

Created with Highcharts 10.0.0genetreestreejointALEmodelusingsequencereconciledlikelihoodeventsreconciliationaccurateGenecombinationDTLspecies-leveltree-speciesspeciesreconstructionbasedalonespaceevolutionamalgamatedapproachfinddemonstratereconstructedrecordgene-levelduplicationtransferlossspeciationextinctionmethodsprocessesdrawingseriesalmostalwaysinvolveschoosingstatisticallyequivalentweaklydistinguishablerelationshipsmuchbetterresolvedputativeexploitpotentialmustexploredaccordingpresentestimationprobabilisticexhaustivelyexplorecancladesobservedsampleimplementcontextSzöllősietal2013allowsgenesuseefficientlyapproximatesumamalgamationsmaximizesamongsimulationssubstantiallyUsingrealistictopologiesbranchlengthsalignmentsizesproducesevengreatlysimplifiedFinallyexamining1099families36cyanobacterialgenomeslikelihood-basedinferenceresultsstrikingreductionapparentphylogeneticdiscordrespectively24%59%46%reductionsmeannumbersduplicationstransferslossesperfamilyopensourceimplementationavailablehttps://githubcom/ssolo/ALEgitEfficientexploration

Similar Articles

Cited By