On the statistical interpretation of site-specific variables in phylogeny-based substitution models.

Nicolas Rodrigue
Author Information
  1. Nicolas Rodrigue: Eastern Cereal and Oilseed Research Centre, Agriculture and Agri-Food Canada, Ottawa, Ontario, Canada. nicolas.rodrigue@agr.gc.ca

Abstract

Phylogeny-based modeling of heterogeneity across the positions of multiple-sequence alignments has generally been approached from two main perspectives. The first treats site specificities as random variables drawn from a statistical law, and the likelihood function takes the form of an integral over this law. The second assigns distinct variables to each position, and, in a maximum-likelihood context, adjusts these variables, along with global parameters, to optimize a joint likelihood function. Here, it is emphasized that while the first approach directly enjoys the statistical guaranties of traditional likelihood theory, the latter does not, and should be approached with particular caution when the site-specific variables are high dimensional. Using a phylogeny-based mutation-selection framework, it is shown that the difference in interpretation of site-specific variables explains the incongruities in recent studies regarding distributions of selection coefficients.

References

  1. Philos Trans R Soc Lond B Biol Sci. 2008 Dec 27;363(1512):4013-21 [PMID: 18852108]
  2. PLoS Pathog. 2008 Dec;4(12):e1000242 [PMID: 19096508]
  3. Syst Biol. 2011 Dec;60(6):881-7 [PMID: 21804092]
  4. Mol Biol Evol. 2008 Mar;25(3):568-79 [PMID: 18178545]
  5. Mol Biol Evol. 2005 Dec;22(12):2375-85 [PMID: 16107593]
  6. Mol Biol Evol. 2004 Jun;21(6):1095-109 [PMID: 15014145]
  7. J Mol Evol. 1994 Sep;39(3):306-14 [PMID: 7932792]
  8. Mol Biol Evol. 1996 Dec;13(10):1368-74 [PMID: 8952081]
  9. Syst Biol. 2007 Oct;56(5):711-26 [PMID: 17849326]
  10. Trends Ecol Evol. 1996 Sep;11(9):367-72 [PMID: 21237881]
  11. Mol Biol Evol. 1993 Nov;10(6):1396-401 [PMID: 8277861]
  12. Syst Biol. 2006 Apr;55(2):259-69 [PMID: 16551582]
  13. Bioinformatics. 2005 Sep 1;21 Suppl 2:ii151-8 [PMID: 16204095]
  14. PLoS Comput Biol. 2012;8(5):e1002507 [PMID: 22589711]
  15. Mol Biol Evol. 2011 Jun;28(6):1755-67 [PMID: 21109586]
  16. J Mol Evol. 1981;17(6):368-76 [PMID: 7288891]
  17. Genetics. 2008 Nov;180(3):1579-91 [PMID: 18791235]
  18. Proc Natl Acad Sci U S A. 2010 Mar 9;107(10):4629-34 [PMID: 20176949]
  19. J Comput Biol. 2006 Dec;13(10):1701-22 [PMID: 17238840]
  20. Mol Biol Evol. 2005 May;22(5):1208-22 [PMID: 15703242]
  21. PLoS Comput Biol. 2009 Nov;5(11):e1000564 [PMID: 19911053]
  22. J Mol Evol. 2001 Oct-Nov;53(4-5):447-55 [PMID: 11675604]
  23. Genetics. 2012 Mar;190(3):1101-15 [PMID: 22209901]
  24. Genetics. 2000 May;155(1):431-49 [PMID: 10790415]
  25. Genetics. 2005 Mar;169(3):1753-62 [PMID: 15654091]
  26. Mol Biol Evol. 1998 Jul;15(7):910-7 [PMID: 9656490]
  27. Bioinformatics. 2008 Jan 1;24(1):56-62 [PMID: 18003644]

MeSH Term

Animals
Bayes Theorem
DNA, Mitochondrial
Data Interpretation, Statistical
Evolution, Molecular
Genetic Variation
Likelihood Functions
Mammals
Markov Chains
Models, Statistical
Orthomyxoviridae
Phylogeny
Sequence Alignment
Viral Proteins

Chemicals

DNA, Mitochondrial
PB2 protein, influenza virus
Viral Proteins

Word Cloud

Created with Highcharts 10.0.0variablesstatisticallikelihoodsite-specificapproachedfirstlawfunctionphylogeny-basedinterpretationPhylogeny-basedmodelingheterogeneityacrosspositionsmultiple-sequencealignmentsgenerallytwomainperspectivestreatssitespecificitiesrandomdrawntakesformintegralsecondassignsdistinctpositionmaximum-likelihoodcontextadjustsalongglobalparametersoptimizejointemphasizedapproachdirectlyenjoysguarantiestraditionaltheorylatterparticularcautionhighdimensionalUsingmutation-selectionframeworkshowndifferenceexplainsincongruitiesrecentstudiesregardingdistributionsselectioncoefficientssubstitutionmodels

Similar Articles

Cited By