Inference of reticulate evolutionary histories by maximum likelihood: the performance of information criteria.

Hyun Jung Park, Luay Nakhleh
Author Information
  1. Hyun Jung Park: Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA. hjpark@bcm.edu

Abstract

BACKGROUND: Maximum likelihood has been widely used for over three decades to infer phylogenetic trees from molecular data. When reticulate evolutionary events occur, several genomic regions may have conflicting evolutionary histories, and a phylogenetic network may provide a more adequate model for representing the evolutionary history of the genomes or species. A maximum likelihood (ML) model has been proposed for this case and accounts for both mutation within a genomic region and reticulation across the regions. However, the performance of this model in terms of inferring information about reticulate evolution and properties that affect this performance have not been studied.
RESULTS: In this paper, we study the effect of the evolutionary diameter and height of a reticulation event on its identifiability under ML. We find both of them, particularly the diameter, have a significant effect. Further, we find that the number of genes (which can be generalized to the concept of "non-recombining genomic regions") that are transferred across a reticulation edge affects its detectability. Last but not least, a fundamental challenge with phylogenetic networks is that they allow an arbitrary level of complexity, giving rise to the model selection problem. We investigate the performance of two information criteria, the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC), for addressing this problem. We find that BIC performs well in general for controlling the model complexity and preventing ML from grossly overestimating the number of reticulation events.
CONCLUSION: Our results demonstrate that BIC provides a good framework for inferring reticulate evolutionary histories. Nevertheless, the results call for caution when interpreting the accuracy of the inference particularly for data sets with particular evolutionary features.

References

  1. Proc Natl Acad Sci U S A. 1996 May 14;93(10):5090-3 [PMID: 11607681]
  2. Genetics. 2007 Mar;175(3):1251-66 [PMID: 17151252]
  3. BMC Genomics. 2012 Jun 19;13:256 [PMID: 22712577]
  4. Mol Biol Evol. 2007 Jan;24(1):324-37 [PMID: 17068107]
  5. Trends Ecol Evol. 2009 Jun;24(6):332-40 [PMID: 19307040]
  6. Syst Biol. 2007 Aug;56(4):633-42 [PMID: 17661231]
  7. Nat Rev Genet. 2006 Nov;7(11):851-61 [PMID: 17033626]
  8. Proc Natl Acad Sci U S A. 2004 Dec 21;101(51):17747-52 [PMID: 15598737]
  9. Nature. 2007 Mar 15;446(7133):279-83 [PMID: 17361174]
  10. Nat Rev Microbiol. 2011 Jun 13;9(7):543-55 [PMID: 21666709]
  11. New Phytol. 1998 Dec;140(4):599-624 [PMID: 33862960]
  12. Am J Bot. 2004 Oct;91:1700-1708 [PMID: 18677414]
  13. Syst Biol. 2004 Oct;53(5):793-808 [PMID: 15545256]
  14. Nature. 2003 Oct 23;425(6960):798-804 [PMID: 14574403]
  15. Annu Rev Genet. 2002;36:75-97 [PMID: 12429687]
  16. Trends Ecol Evol. 2005 May;20(5):229-37 [PMID: 16701374]
  17. Nat Genet. 2004 Jul;36(7):760-6 [PMID: 15208628]
  18. Nature. 2004 Nov 11;432(7014):165-6 [PMID: 15538356]
  19. Syst Biol. 2009 Oct;58(5):478-88 [PMID: 20525602]
  20. Trends Cell Biol. 1999 Dec;9(12):M5-8 [PMID: 10611671]
  21. Syst Biol. 2011 Mar;60(2):138-49 [PMID: 21248369]
  22. Science. 1999 Jun 25;284(5423):2124-9 [PMID: 10381871]
  23. Mol Biol Evol. 2004 Jul;21(7):1294-307 [PMID: 15115802]
  24. Bioinformatics. 2008 Jul 1;24(13):i123-31 [PMID: 18586704]
  25. Nature. 2003 Jul 10;424(6945):197-201 [PMID: 12853958]
  26. Mol Biol Evol. 1996 Jul;13(6):873-82 [PMID: 8754222]
  27. BMC Evol Biol. 2010 May 05;10:131 [PMID: 20444286]
  28. Am J Bot. 2005 Dec;92(12):2086-100 [PMID: 21646125]
  29. Mol Biol Evol. 2008 Dec;25(12):2689-98 [PMID: 18820254]
  30. PLoS Genet. 2012;8(4):e1002660 [PMID: 22536161]
  31. Theor Popul Biol. 2009 Feb;75(1):35-45 [PMID: 19038278]
  32. BMC Evol Biol. 2010 Aug 09;10:242 [PMID: 20696057]
  33. PLoS Genet. 2006 Oct 27;2(10):e173 [PMID: 17132051]
  34. Plant Mol Biol. 2000 Jan;42(1):205-24 [PMID: 10688138]
  35. Comput Appl Biosci. 1997 Jun;13(3):235-8 [PMID: 9183526]
  36. Genetics. 2010 Dec;186(4):1435-49 [PMID: 20923983]
  37. Proc Natl Acad Sci U S A. 2002 Dec 24;99(26):17020-4 [PMID: 12471157]
  38. Bioinformatics. 2006 Nov 1;22(21):2604-11 [PMID: 16928736]
  39. Nature. 2000 May 18;405(6784):299-304 [PMID: 10830951]
  40. Proc Natl Acad Sci U S A. 2003 Aug 19;100(17):9658-62 [PMID: 12902542]
  41. Nat Genet. 2004 Dec;36(12):1268-74 [PMID: 15531882]
  42. J Mol Evol. 2002 Mar;54(3):396-402 [PMID: 11847565]

Grants

  1. R01LM009494/NLM NIH HHS

MeSH Term

Bayes Theorem
Evolution, Molecular
Genes
Likelihood Functions
Models, Genetic
Phylogeny

Word Cloud

Created with Highcharts 10.0.0evolutionarymodelreticulatereticulationperformancephylogeneticgenomichistoriesMLinformationfindBIClikelihooddataeventsregionsmaymaximumacrossinferringeffectdiameterparticularlynumbercomplexityproblemcriteriaInformationCriterionresultsBACKGROUND:MaximumwidelyusedthreedecadesinfertreesmolecularoccurseveralconflictingnetworkprovideadequaterepresentinghistorygenomesspeciesproposedcaseaccountsmutationwithinregionHowevertermsevolutionpropertiesaffectstudiedRESULTS:paperstudyheighteventidentifiabilitysignificantgenescangeneralizedconcept"non-recombiningregions"transferrededgeaffectsdetectabilityLastleastfundamentalchallengenetworksallowarbitrarylevelgivingriseselectioninvestigatetwoAkaikeAICBayesianaddressingperformswellgeneralcontrollingpreventinggrosslyoverestimatingCONCLUSION:demonstrateprovidesgoodframeworkNeverthelesscallcautioninterpretingaccuracyinferencesetsparticularfeaturesInferencelikelihood:

Similar Articles

Cited By