Practical guidelines for Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC).

Joëlle Barido-Sottani, Orlando Schwery, Rachel C M Warnock, Chi Zhang, April Marie Wright
Author Information
  1. Joëlle Barido-Sottani: Institut de Biologie de l'ENS (IBENS), École normale supérieure, CNRS, INSERM, Université PSL, Paris, Île-de-France, 75005, France. ORCID
  2. Orlando Schwery: Department of Biological Sciences, Southeastern Louisiana University, Hammond, Louisiana, 70402, USA. ORCID
  3. Rachel C M Warnock: GeoZentrum Nordbayern, Department of Geography and Geosciences, Friedrich-Alexander Universität Erlangen-Nürnberg, Erlangen, Bavaria, 91054, Germany. ORCID
  4. Chi Zhang: Key Laboratory of Vertebrate Evolution and Human Origins, Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing, 100044, China. ORCID
  5. April Marie Wright: Department of Biological Sciences, Southeastern Louisiana University, Hammond, Louisiana, 70402, USA.

Abstract

Phylogenetic estimation is, and has always been, a complex endeavor. Estimating a phylogenetic tree involves evaluating many possible solutions and possible evolutionary histories that could explain a set of observed data, typically by using a model of evolution. Values for all model parameters need to be evaluated as well. Modern statistical methods involve not just the estimation of a tree, but also solutions to more complex models involving fossil record information and other data sources. Markov chain Monte Carlo (MCMC) is a leading method for approximating the posterior distribution of parameters in a mathematical model. It is deployed in all Bayesian phylogenetic tree estimation software. While many researchers use MCMC in phylogenetic analyses, interpreting results and diagnosing problems with MCMC remain vexing issues to many biologists. In this manuscript, we will offer an overview of how MCMC is used in Bayesian phylogenetic inference, with a particular emphasis on complex hierarchical models, such as the fossilized birth-death (FBD) model. We will discuss strategies to diagnose common MCMC problems and troubleshoot difficult analyses, in particular convergence issues. We will show how the study design, the choice of models and priors, but also technical features of the inference tools themselves can all be adjusted to obtain the best results. Finally, we will also discuss the unique challenges created by the incorporation of fossil information in phylogenetic inference, and present tips to address them.

Keywords

References

  1. J Mol Evol. 1980 Dec;16(2):111-20 [PMID: 7463489]
  2. Syst Biol. 2017 Jan 01;66(1):57-73 [PMID: 28173531]
  3. Syst Biol. 2023 May 19;72(1):62-77 [PMID: 36472372]
  4. Evolution. 2005 Jan;59(1):24-37 [PMID: 15792224]
  5. Syst Biol. 2024 May 27;73(1):102-124 [PMID: 38085256]
  6. Bioinformatics. 2004 Feb 12;20(3):407-15 [PMID: 14960467]
  7. Mol Biol Evol. 1998 Dec;15(12):1647-57 [PMID: 9866200]
  8. Syst Biol. 2020 Jan 1;69(1):124-138 [PMID: 31127936]
  9. Syst Biol. 2019 Jul 1;68(4):657-671 [PMID: 30649562]
  10. Theor Popul Biol. 2013 Dec;90:113-28 [PMID: 24157567]
  11. Syst Biol. 2023 Jun 16;72(2):466-475 [PMID: 36382797]
  12. Mol Biol Evol. 2017 Mar 1;34(3):772-773 [PMID: 28013191]
  13. Syst Biol. 2014 May;63(3):334-48 [PMID: 24415681]
  14. Syst Biol. 1998 Dec;47(4):702-10 [PMID: 12066312]
  15. Syst Biol. 2016 Jul;65(4):726-36 [PMID: 27235697]
  16. J Mol Evol. 1985;22(2):160-74 [PMID: 3934395]
  17. Nat Methods. 2012 Jul 30;9(8):772 [PMID: 22847109]
  18. Syst Biol. 2018 Sep 1;67(5):901-904 [PMID: 29718447]
  19. J Mol Evol. 1981;17(6):368-76 [PMID: 7288891]
  20. Syst Biol. 2003 Feb;52(1):124-6 [PMID: 12554446]
  21. Syst Biol. 2016 Mar;65(2):228-49 [PMID: 26493827]
  22. Syst Biol. 2003 Apr;52(2):131-58 [PMID: 12746144]
  23. Mol Biol Evol. 2013 Feb;30(2):239-43 [PMID: 23090976]
  24. Proc Biol Sci. 2019 May 15;286(1902):20190685 [PMID: 31064306]
  25. Mol Biol Evol. 2008 Jul;25(7):1253-6 [PMID: 18397919]
  26. Mol Biol Evol. 2017 Apr 1;34(4):1016-1020 [PMID: 28087773]
  27. PLoS Biol. 2006 May;4(5):e88 [PMID: 16683862]
  28. Syst Biol. 2007 Jun;56(3):453-66 [PMID: 17558967]
  29. Mol Biol Evol. 2006 Jan;23(1):212-26 [PMID: 16177230]
  30. BMC Evol Biol. 2017 Feb 6;17(1):42 [PMID: 28166715]
  31. Syst Biol. 2014 May;63(3):293-308 [PMID: 24149077]
  32. Syst Biol. 2001 Nov-Dec;50(6):913-25 [PMID: 12116640]
  33. Syst Biol. 2020 Sep 1;69(5):863-883 [PMID: 31985800]
  34. Bioinformatics. 2001 Aug;17(8):754-5 [PMID: 11524383]
  35. Syst Biol. 2002 Oct;51(5):729-39 [PMID: 12396587]
  36. Syst Biol. 2019 Mar 1;68(2):219-233 [PMID: 29961836]
  37. Mol Biol Evol. 2023 May 2;40(5): [PMID: 37140129]
  38. Syst Biol. 2014 May;63(3):322-33 [PMID: 23985785]
  39. Nat Commun. 2018 Dec 7;9(1):5237 [PMID: 30532040]
  40. PLoS Comput Biol. 2014 Dec 04;10(12):e1003919 [PMID: 25474353]
  41. Genetics. 2014 Jun;197(2):561-72 [PMID: 24939995]
  42. PLoS Comput Biol. 2013;9(1):e1002803 [PMID: 23341757]
  43. Biometrics. 1999 Mar;55(1):1-12 [PMID: 11318142]
  44. PLoS Comput Biol. 2014 Apr 10;10(4):e1003537 [PMID: 24722319]
  45. Philos Trans R Soc Lond B Biol Sci. 2016 Jul 19;371(1699): [PMID: 27325827]
  46. Mol Biol Evol. 2002 Jul;19(7):1171-80 [PMID: 12082136]
  47. Syst Biol. 2008 Apr;57(2):185-201 [PMID: 18404577]
  48. Proc Natl Acad Sci U S A. 2014 Jul 22;111(29):E2957-66 [PMID: 25009181]
  49. Syst Biol. 2015 May;64(3):472-91 [PMID: 25631175]
  50. Bioinformatics. 2008 Feb 15;24(4):581-3 [PMID: 17766271]
  51. Syst Biol. 2014 May;63(3):309-21 [PMID: 24193892]
  52. Bioinformatics. 1998;14(9):817-8 [PMID: 9918953]
  53. Syst Biol. 2019 Nov 1;68(6):967-986 [PMID: 30816937]
  54. Bioinformatics. 2014 Apr 1;30(7):1017-9 [PMID: 24234002]
  55. Syst Biol. 2019 Mar 1;68(2):358-364 [PMID: 29945220]
  56. Am Nat. 2015 Aug;186(2):E33-50 [PMID: 26655160]
  57. Mol Biol Evol. 2015 Jan;32(1):268-74 [PMID: 25371430]
  58. Genetics. 2002 Jul;161(3):1307-20 [PMID: 12136032]
  59. Viruses. 2022 Jul 27;14(8): [PMID: 36016270]
  60. Syst Biol. 2023 Jun 17;72(3):713-722 [PMID: 36897743]
  61. Syst Biol. 2024 Jul 04;: [PMID: 38963801]
  62. J Math Biol. 2022 May 3;84(6):47 [PMID: 35503141]
  63. Bioinformatics. 2009 Feb 15;25(4):537-8 [PMID: 19098028]
  64. Mol Biol Evol. 2012 Jun;29(6):1695-701 [PMID: 22319168]
  65. Proc Biol Sci. 2015 Jan 7;282(1798):20141013 [PMID: 25429012]
  66. Proc Natl Acad Sci U S A. 1998 Aug 4;95(16):9402-6 [PMID: 9689092]
  67. Syst Biol. 2018 Jul 1;67(4):729-734 [PMID: 29462409]
  68. Syst Biol. 2004 Dec;53(6):877-88 [PMID: 15764557]
  69. Syst Biol. 2012 Dec 1;61(6):973-99 [PMID: 22723471]
  70. Bioinformatics. 2014 Aug 15;30(16):2272-9 [PMID: 24753484]
  71. J Theor Biol. 2018 Jun 14;447:41-55 [PMID: 29550451]
  72. PeerJ. 2020 Sep 16;8:e9473 [PMID: 32995072]
  73. Insect Syst Divers. 2019 May;3(3):2 [PMID: 31355348]
  74. Mol Biol Evol. 2012 Sep;29(9):2157-67 [PMID: 22403239]
  75. J Theor Biol. 2010 Dec 7;267(3):396-404 [PMID: 20851708]

Word Cloud

Created with Highcharts 10.0.0phylogeneticMCMCinferencemodelBayesianwillestimationcomplextreemanyalsomodelspossiblesolutionsdatausingparametersfossilinformationMarkovchainMonteCarlosoftwareanalysesresultsproblemsissuesparticularfossilizedbirth-deathdiscussPhylogeneticalwaysendeavorEstimatinginvolvesevaluatingevolutionaryhistoriesexplainsetobservedtypicallyevolutionValuesneedevaluatedwellModernstatisticalmethodsinvolvejustinvolvingrecordsourcesleadingmethodapproximatingposteriordistributionmathematicaldeployedresearchersuseinterpretingdiagnosingremainvexingbiologistsmanuscriptofferoverviewusedemphasishierarchicalFBDstrategiesdiagnosecommontroubleshootdifficultconvergenceshowstudydesignchoicepriorstechnicalfeaturestoolscanadjustedobtainbestFinallyuniquechallengescreatedincorporationpresenttipsaddressthemPracticalguidelinesBEAST2MrBayestotal-evidencetroubleshooting

Similar Articles

Cited By