Joint amalgamation of most parsimonious reconciled gene trees.

Celine Scornavacca, Edwin Jacox, Gergely J Szöllősi
Author Information
  1. Celine Scornavacca: ISEM, UM2-CNRS-IRD, Place Eugène Bataillon 34095 Montpellier, France, Institut de Biologie Computationnelle (IBC), 95 rue de la Galéra, 34095 Montpellier, France and ELTE-MTA 'Lendület' Biophysics Research Group 1117 Bp., Pázmány P. stny. 1A., Budapest, Hungary ISEM, UM2-CNRS-IRD, Place Eugène Bataillon 34095 Montpellier, France, Institut de Biologie Computationnelle (IBC), 95 rue de la Galéra, 34095 Montpellier, France and ELTE-MTA 'Lendület' Biophysics Research Group 1117 Bp., Pázmány P. stny. 1A., Budapest, Hungary.
  2. Edwin Jacox: ISEM, UM2-CNRS-IRD, Place Eugène Bataillon 34095 Montpellier, France, Institut de Biologie Computationnelle (IBC), 95 rue de la Galéra, 34095 Montpellier, France and ELTE-MTA 'Lendület' Biophysics Research Group 1117 Bp., Pázmány P. stny. 1A., Budapest, Hungary.
  3. Gergely J Szöllősi: ISEM, UM2-CNRS-IRD, Place Eugène Bataillon 34095 Montpellier, France, Institut de Biologie Computationnelle (IBC), 95 rue de la Galéra, 34095 Montpellier, France and ELTE-MTA 'Lendület' Biophysics Research Group 1117 Bp., Pázmány P. stny. 1A., Budapest, Hungary.

Abstract

MOTIVATION: Traditionally, gene phylogenies have been reconstructed solely on the basis of molecular sequences; this, however, often does not provide enough information to distinguish between statistically equivalent relationships. To address this problem, several recent methods have incorporated information on the species phylogeny in gene tree reconstruction, leading to dramatic improvements in accuracy. Although probabilistic methods are able to estimate all model parameters but are computationally expensive, parsimony methods-generally computationally more efficient-require a prior estimate of parameters and of the statistical support.
RESULTS: Here, we present the Tree Estimation using Reconciliation (TERA) algorithm, a parsimony based, species tree aware method for gene tree reconstruction based on a scoring scheme combining duplication, transfer and loss costs with an estimate of the sequence likelihood. TERA explores all reconciled gene trees that can be amalgamated from a sample of gene trees. Using a large scale simulated dataset, we demonstrate that TERA achieves the same accuracy as the corresponding probabilistic method while being faster, and outperforms other parsimony-based methods in both accuracy and speed. Running TERA on a set of 1099 homologous gene families from complete cyanobacterial genomes, we find that incorporating knowledge of the species tree results in a two thirds reduction in the number of apparent transfer events.

References

  1. Genetics. 2003 Aug;164(4):1645-56 [PMID: 12930768]
  2. Syst Biol. 2013 May 1;62(3):386-97 [PMID: 23355531]
  3. Bioinformatics. 2003;19 Suppl 1:i7-15 [PMID: 12855432]
  4. Syst Biol. 2012 May;61(3):539-42 [PMID: 22357727]
  5. PLoS Comput Biol. 2009 Sep;5(9):e1000501 [PMID: 19749978]
  6. Proc Natl Acad Sci U S A. 2012 Oct 23;109(43):17513-8 [PMID: 23043116]
  7. Syst Biol. 2013 Nov;62(6):901-12 [PMID: 23925510]
  8. Bioinformatics. 2012 Jun 15;28(12):i283-91 [PMID: 22689773]
  9. Brief Bioinform. 2011 Sep;12(5):392-400 [PMID: 21949266]
  10. Syst Biol. 2014 May;63(3):409-20 [PMID: 24562812]
  11. Genome Res. 2007 Dec;17(12):1932-42 [PMID: 17989260]
  12. BMC Bioinformatics. 2009;10 Suppl 6:S3 [PMID: 19534752]
  13. Algorithms Mol Biol. 2013 Apr 08;8(1):12 [PMID: 23566548]
  14. IEEE/ACM Trans Comput Biol Bioinform. 2011 Mar-Apr;8(2):517-35 [PMID: 21233529]
  15. Syst Biol. 2013 Jan 1;62(1):110-20 [PMID: 22949484]
  16. Algorithms Mol Biol. 2010 Feb 03;5:16 [PMID: 20181081]
  17. Methods Mol Biol. 2012;856:29-51 [PMID: 22399454]
  18. Nature. 2011 Jan 6;469(7328):93-6 [PMID: 21170026]
  19. Genetics. 2005 May;170(1):419-31 [PMID: 15781714]
  20. Genome Res. 2012 Apr;22(4):755-65 [PMID: 22271778]
  21. Syst Biol. 2012 Jan;61(1):1-11 [PMID: 21828081]
  22. Syst Biol. 2013 Jul;62(4):501-11 [PMID: 23479066]
  23. Mol Biol Evol. 2011 Jan;28(1):273-90 [PMID: 20660489]
  24. Genome Res. 2014 Mar;24(3):475-86 [PMID: 24310000]
  25. Bioinformatics. 2009 Sep 1;25(17):2286-8 [PMID: 19535536]
  26. J Mol Evol. 1981;17(6):368-76 [PMID: 7288891]
  27. Mol Biol Evol. 2008 Jul;25(7):1307-20 [PMID: 18367465]
  28. Syst Biol. 2010 May;59(3):307-21 [PMID: 20525638]
  29. Syst Biol. 2015 Jan;64(1):e42-62 [PMID: 25070970]
  30. Genome Res. 2013 Feb;23(2):323-30 [PMID: 23132911]
  31. Proc Natl Acad Sci U S A. 2009 Apr 7;106(14):5714-9 [PMID: 19299507]

MeSH Term

Algorithms
Computer Simulation
Cyanobacteria
Evolution, Molecular
Gene Duplication
Genome, Bacterial
Multigene Family
Phylogeny

Word Cloud

Created with Highcharts 10.0.0genetreeTERAmethodsspeciesaccuracyestimatetreesinformationreconstructionprobabilisticparameterscomputationallyparsimonybasedmethodtransferreconciledMOTIVATION:TraditionallyphylogeniesreconstructedsolelybasismolecularsequenceshoweveroftenprovideenoughdistinguishstatisticallyequivalentrelationshipsaddressproblemseveralrecentincorporatedphylogenyleadingdramaticimprovementsAlthoughablemodelexpensivemethods-generallyefficient-requirepriorstatisticalsupportRESULTS:presentTreeEstimationusingReconciliationalgorithmawarescoringschemecombiningduplicationlosscostssequencelikelihoodexplorescanamalgamatedsampleUsinglargescalesimulateddatasetdemonstrateachievescorrespondingfasteroutperformsparsimony-basedspeedRunningset1099homologousfamiliescompletecyanobacterialgenomesfindincorporatingknowledgeresultstwothirdsreductionnumberapparenteventsJointamalgamationparsimonious

Similar Articles

Cited By