Terraces in species tree inference from gene trees.

Mursalin Habib, Kowshic Roy, Saem Hasan, Atif Hasan Rahman, Md Shamsuzzoha Bayzid
Author Information
  1. Mursalin Habib: Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh.
  2. Kowshic Roy: Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh.
  3. Saem Hasan: Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh.
  4. Atif Hasan Rahman: Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh.
  5. Md Shamsuzzoha Bayzid: Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh. shams_bayzid@cse.buet.ac.bd.

Abstract

A terrace in a phylogenetic tree space is a region where all trees contain the same set of subtrees, due to certain patterns of missing data among the taxa sampled, resulting in an identical optimality score for a given data set. This was first investigated in the context of phylogenetic tree estimation from sequence alignments using maximum likelihood (ML) and maximum parsimony (MP). It was later extended to the species tree inference problem from a collection of gene trees, where a set of equally optimal species trees was referred to as a "pseudo" species tree terrace which does not consider the topological proximity of the trees in terms of the induced subtrees resulting from certain patterns of missing data. In this study, we mathematically characterize species tree terraces and investigate the mathematical properties and conditions that lead multiple species trees to induce/display an identical set of locus-specific subtrees owing to missing data. We report that species tree terraces are agnostic to gene tree heterogeneity. Therefore, we introduce and characterize a special type of gene tree topology-aware terrace which we call "peak terrace". Moreover, we empirically investigated various challenges and opportunities related to species tree terraces through extensive empirical studies using simulated and real biological data. We demonstrate the prevalence of species tree terraces and the resulting ambiguity created for tree search algorithms. Remarkably, our findings indicate that the identification of terraces could potentially lead to advances that enhance the accuracy of summary methods and provide reasonably accurate branch support.

Keywords

References

  1. Bioinformatics. 2014 Dec 1;30(23):3317-24 [PMID: 25104814]
  2. Bioinformatics. 2014 Sep 1;30(17):i541-8 [PMID: 25161245]
  3. Bioinformatics. 2009 Apr 1;25(7):971-3 [PMID: 19211573]
  4. Science. 2014 Dec 12;346(6215):1320-31 [PMID: 25504713]
  5. Bioinformatics. 2019 May 15;35(10):1771-1773 [PMID: 30321303]
  6. Bioinformatics. 2010 Jun 15;26(12):i132-9 [PMID: 20529898]
  7. Nat Ecol Evol. 2020 Jan;4(1):138-147 [PMID: 31819234]
  8. Mol Biol Evol. 2010 Mar;27(3):570-80 [PMID: 19906793]
  9. Syst Biol. 2017 Jan 1;66(1):e83-e94 [PMID: 28173538]
  10. Algorithms Mol Biol. 2018 Jan 19;13:1 [PMID: 29387142]
  11. Syst Biol. 2016 Mar;65(2):334-44 [PMID: 26526427]
  12. BMC Genomics. 2020 Feb 10;21(1):136 [PMID: 32039704]
  13. Syst Biol. 2011 Mar;60(2):126-37 [PMID: 21088009]
  14. Syst Biol. 2007 Feb;56(1):17-24 [PMID: 17366134]
  15. PLoS One. 2014 Aug 12;9(8):e104008 [PMID: 25117474]
  16. Philos Trans R Soc Lond B Biol Sci. 2008 Dec 27;363(1512):3977-84 [PMID: 18852107]
  17. Theor Popul Biol. 2014 Dec 26;100C:56-62 [PMID: 25545843]
  18. BMC Evol Biol. 2010 Oct 11;10:302 [PMID: 20937096]
  19. Bioinformatics. 2010 Nov 15;26(22):2910-1 [PMID: 20861028]
  20. Proc Natl Acad Sci U S A. 2007 Apr 3;104(14):5936-41 [PMID: 17392434]
  21. Mol Biol Evol. 2016 Jul;33(7):1654-68 [PMID: 27189547]
  22. Bioinformatics. 2013 Sep 15;29(18):2277-84 [PMID: 23842808]
  23. Science. 2011 Jul 22;333(6041):448-50 [PMID: 21680810]
  24. Bioinformatics. 2021 Nov 5;37(21):3734-3743 [PMID: 34086858]
  25. IEEE/ACM Trans Comput Biol Bioinform. 2011 Nov-Dec;8(6):1685-91 [PMID: 21576759]
  26. Bioinformatics. 2023 Jan 1;39(1): [PMID: 36576010]
  27. Mol Biol Evol. 2012 Aug;29(8):1917-32 [PMID: 22422763]
  28. Syst Biol. 2016 Nov;65(6):997-1008 [PMID: 27121966]
  29. Mol Biol Evol. 2015 Jan;32(1):268-74 [PMID: 25371430]
  30. Syst Biol. 2015 Sep;64(5):709-26 [PMID: 25999395]
  31. IEEE/ACM Trans Comput Biol Bioinform. 2010 Jan-Mar;7(1):166-71 [PMID: 20150678]
  32. Syst Biol. 2011 Oct;60(5):661-7 [PMID: 21447481]
  33. Mol Biol Evol. 1993 Nov;10(6):1396-401 [PMID: 8277861]
  34. BMC Evol Biol. 2018 Apr 4;18(1):46 [PMID: 29618314]
  35. Pac Symp Biocomput. 2013;:250-61 [PMID: 23424130]
  36. BMC Bioinformatics. 2008 Jul 28;9:322 [PMID: 18662388]
  37. Proc Natl Acad Sci U S A. 2014 Nov 11;111(45):E4859-68 [PMID: 25355905]
  38. Bioinformatics. 2018 Oct 1;34(19):3399-3401 [PMID: 29746618]
  39. J Comput Biol. 2011 Nov;18(11):1543-59 [PMID: 22035329]
  40. Mol Biol Evol. 2010 Mar;27(3):552-69 [PMID: 19833741]
  41. BMC Bioinformatics. 2018 May 8;19(Suppl 6):153 [PMID: 29745866]
  42. Syst Biol. 2021 Oct 13;70(6):1213-1231 [PMID: 33844023]

MeSH Term

Phylogeny
Models, Genetic

Word Cloud

Created with Highcharts 10.0.0treespeciestreesterracesdatasetgeneterracesubtreesmissingresultingphylogeneticcertainpatternsidenticalinvestigatedusingmaximuminferencecharacterizeleadGeneSpeciesspaceregioncontaindueamongtaxasampledoptimalityscoregivenfirstcontextestimationsequencealignmentslikelihoodMLparsimonyMPlaterextendedproblemcollectionequallyoptimalreferred"pseudo"considertopologicalproximitytermsinducedstudymathematicallyinvestigatemathematicalpropertiesconditionsmultipleinduce/displaylocus-specificowingreportagnosticheterogeneityThereforeintroducespecialtypetopology-awarecall"peakterrace"MoreoverempiricallyvariouschallengesopportunitiesrelatedextensiveempiricalstudiessimulatedrealbiologicaldemonstrateprevalenceambiguitycreatedsearchalgorithmsRemarkablyfindingsindicateidentificationpotentiallyadvancesenhanceaccuracysummarymethodsprovidereasonablyaccuratebranchsupportTerracesdiscordanceSummarymethod

Similar Articles

Cited By (1)