Inference of species phylogenies from bi-allelic markers using pseudo-likelihood.

Jiafan Zhu, Luay Nakhleh
Author Information
  1. Jiafan Zhu: Department of Computer Science, Rice University, Houston, TX, USA.
  2. Luay Nakhleh: Department of Computer Science, Rice University, Houston, TX, USA.

Abstract

Motivation: Phylogenetic networks represent reticulate evolutionary histories. Statistical methods for their inference under the multispecies coalescent have recently been developed. A particularly powerful approach uses data that consist of bi-allelic markers (e.g. single nucleotide polymorphism data) and allows for exact likelihood computations of phylogenetic networks while numerically integrating over all possible gene trees per marker. While the approach has good accuracy in terms of estimating the network and its parameters, likelihood computations remain a major computational bottleneck and limit the method's applicability.
Results: In this article, we first demonstrate why likelihood computations of networks take orders of magnitude more time when compared to trees. We then propose an approach for inference of phylogenetic networks based on pseudo-likelihood using bi-allelic markers. We demonstrate the scalability and accuracy of phylogenetic network inference via pseudo-likelihood computations on simulated data. Furthermore, we demonstrate aspects of robustness of the method to violations in the underlying assumptions of the employed statistical model. Finally, we demonstrate the application of the method to biological data. The proposed method allows for analyzing larger datasets in terms of the numbers of taxa and reticulation events. While pseudo-likelihood had been proposed before for data consisting of gene trees, the work here uses sequence data directly, offering several advantages as we discuss.
Availability and implementation: The methods have been implemented in PhyloNet (http://bioinfocs.rice.edu/phylonet).

References

  1. BMC Genomics. 2015;16 Suppl 10:S10 [PMID: 26450642]
  2. Genetics. 2003 Aug;164(4):1645-56 [PMID: 12930768]
  3. Trends Ecol Evol. 2009 Jun;24(6):332-40 [PMID: 19307040]
  4. BMC Bioinformatics. 2008 Jul 28;9:322 [PMID: 18662388]
  5. Trends Ecol Evol. 2005 May;20(5):229-37 [PMID: 16701374]
  6. J Comput Biol. 2007 May;14(4):517-35 [PMID: 17572027]
  7. J Comput Biol. 2007 Apr;14(3):360-77 [PMID: 17563317]
  8. PLoS Comput Biol. 2018 Jan 10;14(1):e1005932 [PMID: 29320496]
  9. Mol Biol Evol. 2012 Aug;29(8):1917-32 [PMID: 22422763]
  10. Mol Ecol. 2016 Jun;25(11):2361-72 [PMID: 26808290]
  11. Syst Biol. 2018 May 1;67(3):439-457 [PMID: 29088409]
  12. Evolution. 2005 Jan;59(1):24-37 [PMID: 15792224]
  13. PLoS Genet. 2016 Mar 07;12(3):e1005896 [PMID: 26950302]
  14. BMC Bioinformatics. 2013;14 Suppl 15:S6 [PMID: 24564257]
  15. PLoS Genet. 2016 May 04;12(5):e1006006 [PMID: 27144273]
  16. Evolution. 2012 Mar;66(3):763-775 [PMID: 22380439]
  17. Science. 2003 Aug 29;301(5637):1211-6 [PMID: 12907807]
  18. Proc Natl Acad Sci U S A. 2014 Nov 18;111(46):16448-53 [PMID: 25368173]
  19. Mol Biol Evol. 2018 Feb 1;35(2):504-517 [PMID: 29220490]
  20. Bioessays. 2016 Feb;38(2):140-9 [PMID: 26709836]
  21. Syst Biol. 2018 Jul 1;67(4):735-740 [PMID: 29514307]
  22. Mol Biol Evol. 2002 Dec;19(12):2226-38 [PMID: 12446813]
  23. Mol Ecol. 2001 Mar;10(3):551-68 [PMID: 11298968]
  24. Nat Rev Genet. 2015 Jun;16(6):359-71 [PMID: 25963373]
  25. Science. 2015 Jan 2;347(6217):1258524 [PMID: 25431491]
  26. Annu Rev Microbiol. 2001;55:709-42 [PMID: 11544372]
  27. PLoS Genet. 2012;8(4):e1002660 [PMID: 22536161]
  28. J Mol Evol. 1981;17(6):368-76 [PMID: 7288891]
  29. Nature. 2007 Mar 15;446(7133):279-83 [PMID: 17361174]

MeSH Term

Alleles
Computational Biology
Evolution, Molecular
Models, Genetic
Phylogeny
Probability
Software

Word Cloud

Created with Highcharts 10.0.0datanetworkscomputationsdemonstratepseudo-likelihoodinferenceapproachbi-allelicmarkerslikelihoodphylogenetictreesmethodmethodsusesallowsgeneaccuracytermsnetworkusingproposedMotivation:PhylogeneticrepresentreticulateevolutionaryhistoriesStatisticalmultispeciescoalescentrecentlydevelopedparticularlypowerfulconsistegsinglenucleotidepolymorphismexactnumericallyintegratingpossiblepermarkergoodestimatingparametersremainmajorcomputationalbottlenecklimitmethod'sapplicabilityResults:articlefirsttakeordersmagnitudetimecomparedproposebasedscalabilityviasimulatedFurthermoreaspectsrobustnessviolationsunderlyingassumptionsemployedstatisticalmodelFinallyapplicationbiologicalanalyzinglargerdatasetsnumberstaxareticulationeventsconsistingworksequencedirectlyofferingseveraladvantagesdiscussAvailabilityimplementation:implementedPhyloNethttp://bioinfocsriceedu/phylonetInferencespeciesphylogenies

Similar Articles

Cited By