Variational Supertrees for Bayesian Phylogenetics.

Michael D Karcher, Cheng Zhang, Frederic A Matsen
Author Information
  1. Michael D Karcher: Department of Math & CS, Muhlenberg College, 2400 W Chew St, Allentown, PA, 18104, USA. michaelkarcher@muhlenberg.edu. ORCID
  2. Cheng Zhang: School of Mathematical Sciences and Center for Statistical Science, Peking University, No. 5 Yiheyuan Road, Haidian District, Beijing, 100871, People's Republic of China.
  3. Frederic A Matsen: Computational Biology Program, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave. N., Seattle, WA, 98109, USA.

Abstract

Bayesian phylogenetic inference is powerful but computationally intensive. Researchers may find themselves with two phylogenetic posteriors on overlapping data sets and may wish to approximate a combined result without having to re-run potentially expensive Markov chains on the combined data set. This raises the question: given overlapping subsets of a set of taxa (e.g. species or virus samples), and given posterior distributions on phylogenetic tree topologies for each of these taxon sets, how can we optimize a probability distribution on phylogenetic tree topologies for the entire taxon set? In this paper we develop a variational approach to this problem and demonstrate its effectiveness. Specifically, we develop an algorithm to find a suitable support of the variational tree topology distribution on the entire taxon set, as well as a gradient-descent algorithm to minimize the divergence from the restrictions of the variational distribution to each of the given per-subset probability distributions, in an effort to approximate the posterior distribution on the entire taxon set.

Keywords

References

  1. Syst Biol. 2008 Apr;57(2):243-50 [PMID: 18398769]
  2. J Comput Biol. 1999 Fall-Winter;6(3-4):369-86 [PMID: 10582573]
  3. BMC Evol Biol. 2007 Nov 08;7:214 [PMID: 17996036]
  4. Trends Ecol Evol. 2004 Jun;19(6):315-22 [PMID: 16701277]
  5. Syst Biol. 2016 May;65(3):397-416 [PMID: 25281847]
  6. Syst Biol. 2012 Jul;61(4):579-93 [PMID: 22223445]
  7. Mol Biol Evol. 2010 Mar;27(3):570-80 [PMID: 19906793]
  8. Virus Evol. 2018 Jun 08;4(1):vey016 [PMID: 29942656]
  9. Syst Biol. 2013 Jul;62(4):501-11 [PMID: 23479066]
  10. Syst Biol. 2012 Jan;61(1):1-11 [PMID: 21828081]
  11. Trends Ecol Evol. 1998 Mar;13(3):105-9 [PMID: 21238221]
  12. Mol Biol Evol. 2003 Mar;20(3):381-7 [PMID: 12644558]
  13. Syst Biol. 2007 Jun;56(3):504-14 [PMID: 17562474]

Grants

  1. U54 grant GM111274/NIH HHS
  2. S10 OD028685/NIH HHS
  3. R01 AI162611/NIAID NIH HHS
  4. R01 grant AI162611/NIH HHS
  5. U54 GM111274/NIGMS NIH HHS
  6. CISE-1564137/National Science Foundation
  7. CISE-1561334/National Science Foundation
  8. S10OD028685/Office of Research Infrastructure Programs, National Institutes of Health

MeSH Term

Bayes Theorem
Phylogeny
Algorithms
Markov Chains
Mathematical Concepts
Models, Genetic
Computer Simulation
Probability

Word Cloud

Created with Highcharts 10.0.0phylogeneticsettaxondistributiongiventreeentirevariationalBayesianmayfindoverlappingdatasetsapproximatecombinedposteriordistributionstopologiesprobabilitydevelopalgorithmVariationalSupertreesPhylogeneticsinferencepowerfulcomputationallyintensiveResearcherstwoposteriorswishresultwithoutre-runpotentiallyexpensiveMarkovchainsraisesquestion:subsetstaxaegspeciesvirussamplescanoptimizeset?paperapproachproblemdemonstrateeffectivenessSpecificallysuitablesupporttopologywellgradient-descentminimizedivergencerestrictionsper-subseteffortDivide-and-conquerGradientdescentmethods

Similar Articles

Cited By

No available data.