Effective Online Bayesian Phylogenetics via Sequential Monte Carlo with Guided Proposals.

Mathieu Fourment, Brian C Claywell, Vu Dinh, Connor McCoy, Frederick A Matsen Iv, Aaron E Darling
Author Information
  1. Mathieu Fourment: ithree institute, University of Technology Sydney, Ultimo, NSW 2007, Australia.
  2. Brian C Claywell: Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
  3. Vu Dinh: Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
  4. Connor McCoy: Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
  5. Frederick A Matsen Iv: Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
  6. Aaron E Darling: ithree institute, University of Technology Sydney, Ultimo, NSW 2007, Australia.

Abstract

Modern infectious disease outbreak surveillance produces continuous streams of sequence data which require phylogenetic analysis as data arrives. Current software packages for Bayesian phylogenetic inference are unable to quickly incorporate new sequences as they become available, making them less useful for dynamically unfolding evolutionary stories. This limitation can be addressed by applying a class of Bayesian statistical inference algorithms called sequential Monte Carlo (SMC) to conduct online inference, wherein new data can be continuously incorporated to update the estimate of the posterior probability distribution. In this article, we describe and evaluate several different online phylogenetic sequential Monte Carlo (OPSMC) algorithms. We show that proposing new phylogenies with a density similar to the Bayesian prior suffers from poor performance, and we develop "guided" proposals that better match the proposal density to the posterior. Furthermore, we show that the simplest guided proposals can exhibit pathological behavior in some situations, leading to poor results, and that the situation can be resolved by heating the proposal density. The results demonstrate that relative to the widely used MCMC-based algorithm implemented in MrBayes, the total time required to compute a series of phylogenetic posteriors as sequences arrive can be significantly reduced by the use of OPSMC, without incurring a significant loss in accuracy.

Associated Data

Dryad | 10.5061/dryad.n7n85

References

  1. Syst Biol. 2012 Jul;61(4):579-93 [PMID: 22223445]
  2. Mol Biol Evol. 2018 Jan 1;35(1):242-246 [PMID: 29029199]
  3. Nat Methods. 2016 Sep;13(9):751-4 [PMID: 27454285]
  4. Syst Biol. 2012 May;61(3):539-42 [PMID: 22357727]
  5. J R Soc Interface. 2014 Feb 26;11(94):20131106 [PMID: 24573331]
  6. Mol Biol Evol. 2013 Aug;30(8):1745-50 [PMID: 23699471]
  7. Syst Biol. 2008 Feb;57(1):86-103 [PMID: 18278678]
  8. Bioinformatics. 2010 Jun 15;26(12):1569-71 [PMID: 20421198]
  9. PLoS Comput Biol. 2014 Apr 10;10(4):e1003537 [PMID: 24722319]
  10. Syst Biol. 2018 May 1;67(3):503-517 [PMID: 29244177]
  11. Bioinformatics. 2015 Nov 1;31(21):3546-8 [PMID: 26115986]
  12. Bioinformatics. 2014 May 15;30(10):1476-7 [PMID: 24478338]
  13. Theor Popul Biol. 1972 Mar;3(1):87-112 [PMID: 4667078]
  14. Syst Biol. 2017 Jan 1;66(1):e83-e94 [PMID: 28173538]
  15. Bioinformatics. 2006 Aug 15;22(16):2047-8 [PMID: 16679334]
  16. Health Inf Sci Syst. 2015 Feb 24;3(Suppl 1 HISA Big Data in Biomedicine and Healthcare 2013 Con):S7 [PMID: 25870761]
  17. Genome Biol. 2015 Jul 30;16(1):155 [PMID: 27391693]
  18. PLoS Comput Biol. 2009 Sep;5(9):e1000520 [PMID: 19779555]
  19. BMC Bioinformatics. 2010 Oct 30;11:538 [PMID: 21034504]
  20. Syst Biol. 2010 May;59(3):307-21 [PMID: 20525638]

Grants

  1. U54 GM111274/NIGMS NIH HHS
  2. /Howard Hughes Medical Institute

MeSH Term

Algorithms
Bayes Theorem
Classification
Internet
Models, Biological
Monte Carlo Method
Phylogeny

Word Cloud

Created with Highcharts 10.0.0canphylogeneticBayesiandatainferencenewMonteCarlodensitysequencesalgorithmssequentialonlineposteriorOPSMCshowpoorproposalsproposalresultsModerninfectiousdiseaseoutbreaksurveillanceproducescontinuousstreamssequencerequireanalysisarrivesCurrentsoftwarepackagesunablequicklyincorporatebecomeavailablemakinglessusefuldynamicallyunfoldingevolutionarystorieslimitationaddressedapplyingclassstatisticalcalledSMCconductwhereincontinuouslyincorporatedupdateestimateprobabilitydistributionarticledescribeevaluateseveraldifferentproposingphylogeniessimilarpriorsuffersperformancedevelop"guided"bettermatchFurthermoresimplestguidedexhibitpathologicalbehaviorsituationsleadingsituationresolvedheatingdemonstraterelativewidelyusedMCMC-basedalgorithmimplementedMrBayestotaltimerequiredcomputeseriesposteriorsarrivesignificantlyreducedusewithoutincurringsignificantlossaccuracyEffectiveOnlinePhylogeneticsviaSequentialGuidedProposals

Similar Articles

Cited By