A topology-marginal composite likelihood via a generalized phylogenetic pruning algorithm.

Seong-Hwan Jun, Hassan Nasif, Chris Jennings-Shaffer, David H Rich, Anna Kooperberg, Mathieu Fourment, Cheng Zhang, Marc A Suchard, Frederick A Matsen
Author Information
  1. Seong-Hwan Jun: Department of Biostatistics and Computational Biology, University of Rochester, Rochester, USA.
  2. Hassan Nasif: Department of Statistics, University of Washington, Seattle, USA.
  3. Chris Jennings-Shaffer: Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
  4. David H Rich: Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
  5. Anna Kooperberg: Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
  6. Mathieu Fourment: Australian Institute for Microbiology and Infection, University of Technology Sydney, Ultimo, NSW, Australia.
  7. Cheng Zhang: School of Mathematical Sciences and Center for Statistical Science, Peking University, Beijing, China.
  8. Marc A Suchard: Department of Human Genetics, University of California, Los Angeles, USA.
  9. Frederick A Matsen: Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA. matsen@fredhutch.org.

Abstract

Bayesian phylogenetics is a computationally challenging inferential problem. Classical methods are based on random-walk Markov chain Monte Carlo (MCMC), where random proposals are made on the tree parameter and the continuous parameters simultaneously. Variational phylogenetics is a promising alternative to MCMC, in which one fits an approximating distribution to the unnormalized phylogenetic posterior. Previous work fit this variational approximation using stochastic gradient descent, which is the canonical way of fitting general variational approximations. However, phylogenetic trees are special structures, giving opportunities for efficient computation. In this paper we describe a new algorithm that directly generalizes the Felsenstein pruning algorithm (a.k.a. sum-product algorithm) to compute a composite-like likelihood by marginalizing out ancestral states and subtrees simultaneously. We show the utility of this algorithm by rapidly making point estimates for branch lengths of a multi-tree phylogenetic model. These estimates accord with a long MCMC run and with estimates obtained using a variational method, but are much faster to obtain. Thus, although generalized pruning does not lead to a variational algorithm as such, we believe that it will form a useful starting point for variational inference.

References

  1. Syst Biol. 2013 Jul;62(4):501-11 [PMID: 23479066]
  2. J Mol Evol. 1981;17(6):368-76 [PMID: 7288891]
  3. Genome Res. 1998 Mar;8(3):222-33 [PMID: 9521926]
  4. Syst Biol. 2012 Jan;61(1):1-11 [PMID: 21828081]
  5. Syst Biol. 2015 May;64(3):472-91 [PMID: 25631175]
  6. Bioinformatics. 2003 Aug 12;19(12):1572-4 [PMID: 12912839]
  7. Syst Biol. 2019 Nov 1;68(6):1052-1061 [PMID: 31034053]
  8. Genome Biol Evol. 2023 Jun 1;15(6): [PMID: 37265233]
  9. Mol Biol Evol. 2020 Oct 1;37(10):3047-3060 [PMID: 32458974]
  10. Syst Biol. 2008 Feb;57(1):86-103 [PMID: 18278678]
  11. Syst Biol. 2020 Mar 1;69(2):209-220 [PMID: 31504998]
  12. J Comput Biol. 2002;9(2):331-53 [PMID: 12015885]
  13. Syst Biol. 2011 Oct;60(5):685-99 [PMID: 21540409]
  14. Nature. 2017 Apr 20;544(7650):309-315 [PMID: 28405027]
  15. Syst Biol. 2006 Oct;55(5):756-68 [PMID: 17060197]
  16. Trends Genet. 2003 Jun;19(6):345-51 [PMID: 12801728]
  17. Syst Biol. 2005 Jun;54(3):401-18 [PMID: 16012107]
  18. Stat Appl Genet Mol Biol. 2012 Sep 25;11(4):Article 14 [PMID: 23023698]
  19. Mol Biol Evol. 2013 May;30(5):1188-95 [PMID: 23418397]

Grants

  1. R01 AI153044/NIAID NIH HHS
  2. R01 AI162611/NIAID NIH HHS
  3. R01 AI162611/NIH HHS

Word Cloud

Created with Highcharts 10.0.0algorithmvariationalphylogeneticMCMCpruningestimatesphylogeneticssimultaneouslyusingalikelihoodpointgeneralizedBayesiancomputationallychallenginginferentialproblemClassicalmethodsbasedrandom-walkMarkovchainMonteCarlorandomproposalsmadetreeparametercontinuousparametersVariationalpromisingalternativeonefitsapproximatingdistributionunnormalizedposteriorPreviousworkfitapproximationstochasticgradientdescentcanonicalwayfittinggeneralapproximationsHowevertreesspecialstructuresgivingopportunitiesefficientcomputationpaperdescribenewdirectlygeneralizesFelsensteinksum-productcomputecomposite-likemarginalizingancestralstatessubtreesshowutilityrapidlymakingbranchlengthsmulti-treemodelaccordlongrunobtainedmethodmuchfasterobtainThusalthoughleadbelievewillformusefulstartinginferencetopology-marginalcompositevia

Similar Articles

Cited By