Rooting Species Trees Using Gene Tree-Species Tree Reconciliation.

Brogan J Harris, Paul O Sheridan, Adrián A Davín, Cécile Gubry-Rangin, Gergely J Szöllősi, Tom A Williams
Author Information
  1. Brogan J Harris: School of Biological Sciences, University of Bristol, Bristol, UK.
  2. Paul O Sheridan: School of Biological Sciences, University of Bristol, Bristol, UK.
  3. Adrián A Davín: Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia.
  4. Cécile Gubry-Rangin: School of Biological Sciences, University of Aberdeen, Aberdeen, UK.
  5. Gergely J Szöllősi: Dept. of Biological Physics, Eötvös Loránd University, Budapest, Hungary.
  6. Tom A Williams: School of Biological Sciences, University of Bristol, Bristol, UK. tom.a.williams@bristol.ac.uk.

Abstract

Interpreting phylogenetic trees requires a root, which provides the direction of evolution and polarizes ancestor-descendant relationships. But inferring the root using genetic data is difficult, particularly in cases where the closest available outgroup is only distantly related, which are common for microbes. In this chapter, we present a workflow for estimating rooted species trees and the evolutionary history of the gene families that evolve within them using probabilistic gene tree-species tree reconciliation. We illustrate the pipeline using a small dataset of prokaryotic genomes, for which the example scripts can be run using modest computer resources. We describe the rooting method used in this work in the context or other rooting strategies and discuss some of the limitations and opportunities presented by probabilistic gene tree-species tree reconciliation methods.

Keywords

References

  1. Felsenstein J (2003) Inferring phylogenies. Sinauer
  2. Bergsten J (2005) A review of long-branch attraction. Cladistics 21:163–193 [PMID: 34892859]
  3. Zuckerkandl E, Pauling L (1965) Molecules as documents of evolutionary history. J Theor Biol 8:357–366 [PMID: 5876245]
  4. Farris JS (1972) Estimating phylogenetic trees from distance matrices. Am Nat 106:645–668
  5. Tria FDK, Landan G, Dagan T (2017) Phylogenetic rooting using minimal ancestor deviation. Nat Ecol Evol 1:193 [PMID: 29388565]
  6. Drummond AJ, Ho SYW, Phillips MJ, Rambaut A (2006) Relaxed phylogenetics and dating with confidence. PLoS Biol 4:699–710
  7. Dos Reis M, Donoghue PCJ, Yang Z (2016) Bayesian molecular clock dating of species divergences in the genomics era. Nat Rev Genet 17:71–80 [PMID: 26688196]
  8. Huelsenbeck JP, Bollback JP, Levine AM (2002) Inferring the root of a phylogenetic tree. Syst Biol 51:32–43 [PMID: 11943091]
  9. Williams TA et al (2015) New substitution models for rooting phylogenetic trees. Philos Trans R Soc B Biol Sci 370
  10. Coleman GA et al (2021) A rooted phylogeny resolves early bacterial evolution. Science (80–) 372
  11. Gogarten JP et al (1989) Evolution of the vacuolar H+-ATPase: implications for the origin of eukaryotes. Proc Natl Acad Sci U S A 86:6661–6665 [PMID: 2528146]
  12. Iwabe N, Kuma K, Hasegawa M, Osawa S, Miyata T (1989) Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes. Proc Natl Acad Sci U S A 86:9355–9359 [PMID: 2531898]
  13. Szöllosi GJ, Boussau B, Abby SS, Tannier E, Daubin V (2012) Phylogenetic modeling of lateral gene transfer reconstructs the pattern and relative timing of speciations. Proc Natl Acad Sci U S A 109:17513–17518 [PMID: 23043116]
  14. Williams TA et al (2017) Integrative modeling of gene and genome evolution roots the archaeal tree of life. Proc Natl Acad Sci U S A 114:E4602–E4611 [PMID: 28533395]
  15. Szöllosi GJ, Tannier E, Lartillot N, Daubin V (2013) Lateral gene transfer from the dead. Syst Biol 62:386–397 [PMID: 23355531]
  16. Doyon JP et al (2010) An efficient algorithm for gene/species trees parsimonious reconciliation with losses, duplications and transfers. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 6398 LNBI. Springer, Berlin, Heidelberg, pp 93–108
  17. Jacox E, Chauve C, Szöllosi GJ, Ponty Y, Scornavacca C (2016) EcceTERA: comprehensive gene tree-species tree reconciliation using parsimony. Bioinformatics 32:2056–2058 [PMID: 27153713]
  18. Bansal MS, Kellis M, Kordi M, Kundu S (2018) RANGER-DTL 2.0: rigorous reconstruction of gene-family evolution by duplication, transfer and loss. Bioinformatics 34:3214–3216 [PMID: 29688310]
  19. Chaudhary R, Bansal MS, Wehe A, Fernández-Baca D, Eulenstein O (2010) iGTP: a software package for large-scale gene tree parsimony analysis. BMC Bioinforma 111(11):1–7
  20. Åkerborg Ö, Sennblad B, Arvestad L, Lagergren J (2009) Simultaneous Bayesian gene tree reconstruction and reconciliation analysis. Proc Natl Acad Sci U S A 106:5714–5719 [PMID: 19299507]
  21. Szöllosi GJ, Rosikiewicz W, Boussau B, Tannier E, Daubin V (2013) Efficient exploration of the space of reconciled gene trees. Syst Biol. https://doi.org/10.1093/sysbio/syt054
  22. Morel B, Kozlov AM, Stamatakis A, Szollosi GJ (2020) GeneRax: a tool for species-tree-aware maximum likelihood-based gene family tree inference under gene duplication, transfer, and loss. Mol Biol Evol 37:2763–2774 [PMID: 32502238]
  23. Sjöstrand J et al (2014) A Bayesian method for analyzing lateral gene transfer. Syst Biol 63:409–420 [PMID: 24562812]
  24. Martins L de O, Posada D (2017) Species tree estimation from genome-wide data with guenomu. Methods Mol Biol 1525:461–478
  25. Groussin M, Boussau B, Gouy M (2013) A branch-heterogeneous model of protein evolution for efficient inference of ancestral sequences. Syst Biol 62:523–538 [PMID: 23475623]
  26. Sheridan PO et al (2020) Gene duplication drives genome expansion in a major lineage of Thaumarchaeota. Nat Commun 11:1–12
  27. Dagan T, Martin W (2006) The tree of one percent. Genome Biol 7:1–7
  28. Dayhoff MO, Barker WC, McLaughlin PJ (1974) Inferences from protein and nucleic acid sequences: early molecular evolution, divergence of kingdoms and rates of change. Cosmochem Evol Orig Life 311–330. https://doi.org/10.1007/978-94-015-1118-6_25
  29. Brown JR, Doolittle WF (1995) Root of the universal tree of life based on ancient aminoacyl-tRNA synthetase gene duplications. Proc Natl Acad Sci U S A 92:2441–2445 [PMID: 7708661]
  30. Baldauf SL, Palmer JD, Doolittle WF (1996) The root of the universal tree and the origin of eukaryotes based on elongation factor phylogeny. Proc Natl Acad Sci U S A 93:7749–7754 [PMID: 8755547]
  31. Zhaxybayeva O, Lapierre P, Gogarten JP (2005) Ancient gene duplications and the root(s) of the tree of life. Protoplasma 227:53–64 [PMID: 16389494]
  32. Gouy R, Baurain D, Philippe H (2015) Rooting the tree of life: the phylogenetic jury is still out. Philos Trans R Soc B Biol Sci 370
  33. Buchfink B, Xie C, Huson DH (2014) Fast and sensitive protein alignment using DIAMOND. Nat Methods 12:59–60 [PMID: 25402007]
  34. Enright AJ, Van Dongen S, Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30:1575–1584 [PMID: 11917018]
  35. Tange O (2018) GNU Parallel 2018. https://doi.org/10.5281/ZENODO.1146014
  36. Katoh K, Toh H (2008) Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform. https://doi.org/10.1093/bib/bbn013
  37. Criscuolo A, Gribaldo S (2010) BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol Biol. https://doi.org/10.1186/1471-2148-10-210
  38. Nguyen LT, Schmidt HA, Von Haeseler A, Minh BQ (2015) IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. https://doi.org/10.1093/molbev/msu300
  39. Larget B (2013) The estimation of tree posterior probabilities using conditional clade probability distributions. Syst Biol 62:501–511 [PMID: 23479066]
  40. Lartillot N, Rodrigue N, Stubbs D, Richer J (2013) Phylobayes mpi: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment. Syst Biol. https://doi.org/10.1093/sysbio/syt022
  41. Yang Z, Rannala B (2012) Molecular phylogenetics: principles and practice. Nat Rev Genet 13:303–314 [PMID: 22456349]
  42. Ren F, Tanaka H, Yang Z (2009) A likelihood look at the supermatrix-supertree controversy. Gene 441:119–125 [PMID: 18502054]
  43. Bravo GA et al (2019) Embracing heterogeneity: coalescing the tree of life and the future of phylogenomics. PeerJ 2019:e6399
  44. Emms DM, Kelly S (2019) OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. https://doi.org/10.1186/s13059-019-1832-y
  45. Letunic I, Bork P (2007) Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics. https://doi.org/10.1093/bioinformatics/btl529
  46. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. https://doi.org/10.1093/bioinformatics/btv351
  47. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW (2015) CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–1055 [PMID: 25977477]
  48. Shimodaira H, Hasegawa M (2001) CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics. https://doi.org/10.1093/bioinformatics/17.12.1246
  49. Kostka M, Uzlikova M, Cepicka I, Flegr J (2008) SlowFaster, a user-friendly program for slow-fast analysis and its application on phylogeny of Blastocystis. BMC Bioinformatics 9:1–6
  50. Viklund J, Ettema TJG, Andersson SGE (2012) Independent genome reduction and phylogenetic reclassification of the oceanic SAR11 clade. Mol Biol Evol 29:599–615 [PMID: 21900598]
  51. Muñoz-Gómez SA et al (2019) An updated phylogeny of the alphaproteobacteria reveals that the parasitic rickettsiales and holosporales have independent origins. elife 8
  52. Huerta-Cepas J et al (2017) Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol Biol Evol 34:2115–2122 [PMID: 28460117]
  53. Kanehisa M, Sato Y, Morishima K (2016) BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J Mol Biol 428:726–731 [PMID: 26585406]
  54. Chen ZH et al (2017) Molecular evolution of grass stomata. Trends Plant Sci 22:124–139 [PMID: 27776931]
  55. Emms DM, Kelly S (2017) STRIDE: species tree root inference from gene duplication events. Mol Biol Evol 34:3267–3278 [PMID: 29029342]
  56. Morel B et al (2021) SpeciesRax: a tool for maximum likelihood species tree inference from gene family trees under duplication, transfer, and loss. bioRxiv 2021.03.29.437460. https://doi.org/10.1101/2021.03.29.437460
  57. Yang Z (1994) Journal of molecular evolution estimating the pattern of nucleotide substitution. J Mol Evol 39
  58. Bettisworth B, Stamatakis A (2021) Root Digger: a root placement program for phylogenetic trees. BMC Bioinforma 221(22):1–20
  59. Jaffe AL et al (2021) Patterns of gene content and co-occurrence constrain the evolutionary path 2 toward animal association in CPR bacteria. bioRxiv 2021.03.03.433784. https://doi.org/10.1101/2021.03.03.433784
  60. Doolittle WF (1999) Phylogenetic classification and the universal tree. Science 284:2124–2128 [PMID: 10381871]
  61. Doolittle WF, Bapteste E (2007) Pattern pluralism and the Tree of Life hypothesis. Proc Natl Acad Sci U S A 104:2043–2049 [PMID: 17261804]
  62. Zwaenepoel A, Van Peer Y, De. (2019) Inference of ancient whole-genome duplications and the evolution of gene duplication and loss rates. Mol Biol Evol 36:1384–1404 [PMID: 31004147]
  63. Hug LA et al (2016) A new view of the tree of life. Nat Microbiol 15(1):1–6
  64. Parks DH et al (2018) A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36:996 [PMID: 30148503]

MeSH Term

Algorithms
Evolution, Molecular
Genome
Models, Genetic
Phylogeny
Prokaryotic Cells

Word Cloud

Created with Highcharts 10.0.0usinggenetreesrootprobabilistictree-speciestreereconciliationrootingRootingReconciliationInterpretingphylogeneticrequiresprovidesdirectionevolutionpolarizesancestor-descendantrelationshipsinferringgeneticdatadifficultparticularlycasesclosestavailableoutgroupdistantlyrelatedcommonmicrobeschapterpresentworkflowestimatingrootedspeciesevolutionaryhistoryfamiliesevolvewithinillustratepipelinesmalldatasetprokaryoticgenomesexamplescriptscanrunmodestcomputerresourcesdescribemethodusedworkcontextstrategiesdiscusslimitationsopportunitiespresentedmethodsSpeciesTreesUsingGeneTree-SpeciesTreeAmalgamatedlikelihoodestimationEvolutionPhylogenetics

Similar Articles

Cited By