Calculating Higher-Order Moments of Phylogenetic Stochastic Mapping Summaries in Linear Time.

Amrit Dhar, Vladimir N Minin
Author Information
  1. Amrit Dhar: 1 Department of Statistics, University of Washington , Seattle, Washington.
  2. Vladimir N Minin: 1 Department of Statistics, University of Washington , Seattle, Washington.

Abstract

Stochastic mapping is a simulation-based method for probabilistically mapping substitution histories onto phylogenies according to continuous-time Markov models of evolution. This technique can be used to infer properties of the evolutionary process on the phylogeny and, unlike parsimony-based mapping, conditions on the observed data to randomly draw substitution mappings that do not necessarily require the minimum number of events on a tree. Most stochastic mapping applications simulate substitution mappings only to estimate the mean and/or variance of two commonly used mapping summaries: the number of particular types of substitutions (labeled substitution counts) and the time spent in a particular group of states (labeled dwelling times) on the tree. Fast, simulation-free algorithms for calculating the mean of stochastic mapping summaries exist. Importantly, these algorithms scale linearly in the number of tips/leaves of the phylogenetic tree. However, to our knowledge, no such algorithm exists for calculating higher-order moments of stochastic mapping summaries. We present one such simulation-free dynamic programming algorithm that calculates prior and posterior mapping variances and scales linearly in the number of phylogeny tips. Our procedure suggests a general framework that can be used to efficiently compute higher-order moments of stochastic mapping summaries without simulations. We demonstrate the usefulness of our algorithm by extending previously developed statistical tests for rate variation across sites and for detecting evolutionarily conserved regions in genomic sequences.

Keywords

References

  1. Genome Res. 1998 Mar;8(3):222-33 [PMID: 9521926]
  2. PLoS Genet. 2006 Oct 13;2(10):e168 [PMID: 17040131]
  3. Math Biosci. 2001 Aug;172(2):115-28 [PMID: 11520502]
  4. Nature. 2007 Jun 14;447(7146):799-816 [PMID: 17571346]
  5. Bioinformatics. 2012 Dec 15;28(24):3248-56 [PMID: 23064000]
  6. J Mol Evol. 2007 Sep;65(3):340-8 [PMID: 17846819]
  7. J Math Biol. 2008 Mar;56(3):391-412 [PMID: 17874105]
  8. Syst Biol. 2002 Oct;51(5):729-39 [PMID: 12396587]
  9. Genetics. 1998 Mar;148(3):929-36 [PMID: 9539414]
  10. J Mol Biol. 2002 Apr 12;317(5):753-64 [PMID: 11955022]
  11. Proc Natl Acad Sci U S A. 2004 Aug 31;101(35):12957-62 [PMID: 15326304]
  12. Philos Trans R Soc Lond B Biol Sci. 2008 Dec 27;363(1512):3985-95 [PMID: 18852111]
  13. Genetics. 2007 Jan;175(1):255-66 [PMID: 17110496]
  14. PLoS Biol. 2006 May;4(5):e88 [PMID: 16683862]
  15. Genome Res. 2005 Aug;15(8):1034-50 [PMID: 16024819]
  16. Syst Biol. 2006 Oct;55(5):756-68 [PMID: 17060197]
  17. Trends Ecol Evol. 1996 Sep;11(9):367-72 [PMID: 21237881]
  18. Bioinformatics. 2005 Jun;21 Suppl 1:i126-35 [PMID: 15961449]
  19. Stat Appl Genet Mol Biol. 2012 Sep 25;11(4):Article 14 [PMID: 23023698]
  20. Syst Biol. 2003 Apr;52(2):131-58 [PMID: 12746144]
  21. Stat Appl Genet Mol Biol. 2005;4:Article18 [PMID: 16646835]
  22. Nature. 2003 May 15;423(6937):241-54 [PMID: 12748633]
  23. Genome Res. 2010 Jan;20(1):110-21 [PMID: 19858363]
  24. Bioinformatics. 2001 Aug;17(8):754-5 [PMID: 11524383]
  25. J Mol Evol. 1994 Sep;39(3):306-14 [PMID: 7932792]
  26. Mol Biol Evol. 2005 Sep;22(9):1919-28 [PMID: 15944445]
  27. J Mol Evol. 1981;17(6):368-76 [PMID: 7288891]
  28. Syst Biol. 2010 May;59(3):307-21 [PMID: 20525638]
  29. Nature. 2006 Sep 14;443(7108):167-72 [PMID: 16915236]

MeSH Term

Algorithms
Computational Biology
Computer Simulation
Evolution, Molecular
Markov Chains
Models, Genetic
Phylogeny

Word Cloud

Created with Highcharts 10.0.0mappingsubstitutionnumberstochasticusedtreesummariesalgorithmStochasticcanevolutionaryphylogenymappingsmeanparticularlabeledsimulation-freealgorithmscalculatinglinearlyhigher-ordermomentsdynamicprogrammingposteriorsimulation-basedmethodprobabilisticallyhistoriesontophylogeniesaccordingcontinuous-timeMarkovmodelsevolutiontechniqueinferpropertiesprocessunlikeparsimony-basedconditionsobserveddatarandomlydrawnecessarilyrequireminimumeventsapplicationssimulateestimateand/orvariancetwocommonlysummaries:typessubstitutionscountstimespentgroupstatesdwellingtimesFastexistImportantlyscaletips/leavesphylogeneticHoweverknowledgeexistspresentonecalculatespriorvariancesscalestipsproceduresuggestsgeneralframeworkefficientlycomputewithoutsimulationsdemonstrateusefulnessextendingpreviouslydevelopedstatisticaltestsratevariationacrosssitesdetectingevolutionarilyconservedregionsgenomicsequencesCalculatingHigher-OrderMomentsPhylogeneticMappingSummariesLinearTimeconservationpredictivediagnostics

Similar Articles

Cited By (1)