Variational inference using approximate likelihood under the coalescent with recombination.

Xinhao Liu, Huw A Ogilvie, Luay Nakhleh
Author Information
  1. Xinhao Liu: Department of Computer Science, Rice University, Houston, Texas 77005, USA.
  2. Huw A Ogilvie: Department of Computer Science, Rice University, Houston, Texas 77005, USA.
  3. Luay Nakhleh: Department of Computer Science, Rice University, Houston, Texas 77005, USA.

Abstract

Coalescent methods are proven and powerful tools for population genetics, phylogenetics, epidemiology, and other fields. A promising avenue for the analysis of large genomic alignments, which are increasingly common, is coalescent hidden Markov model (coalHMM) methods, but these methods have lacked general usability and flexibility. We introduce a novel method for automatically learning a coalHMM and inferring the posterior distributions of evolutionary parameters using black-box variational inference, with the transition rates between local genealogies derived empirically by simulation. This derivation enables our method to work directly with three or four taxa and through a divide-and-conquer approach with more taxa. Using a simulated data set resembling a human-chimp-gorilla scenario, we show that our method has comparable or better accuracy to previous coalHMM methods. Both species divergence times and population sizes were accurately inferred. The method also infers local genealogies, and we report on their accuracy. Furthermore, we discuss a potential direction for scaling the method to larger data sets through a divide-and-conquer approach. This accuracy means our method is useful now, and by deriving transition rates by simulation, it is flexible enough to enable future implementations of various population models.

References

  1. Mol Biol Evol. 2009 Aug;26(8):1879-88 [PMID: 19423664]
  2. Genetics. 2011 Apr;187(4):1115-28 [PMID: 21270390]
  3. PLoS Comput Biol. 2016 May 04;12(5):e1004842 [PMID: 27145223]
  4. Nat Rev Genet. 2014 May;15(5):347-59 [PMID: 24709753]
  5. Mol Phylogenet Evol. 2016 Jan;94(Pt A):1-33 [PMID: 26238460]
  6. Philos Trans R Soc Lond B Biol Sci. 2005 Jul 29;360(1459):1387-93 [PMID: 16048782]
  7. Science. 2019 Nov 1;366(6465):594-599 [PMID: 31672890]
  8. J Mol Evol. 1981;17(6):368-76 [PMID: 7288891]
  9. Syst Biol. 2018 Jul 1;67(4):735-740 [PMID: 29514307]
  10. PLoS Genet. 2007 Feb 23;3(2):e7 [PMID: 17319744]
  11. Syst Biol. 2012 Jan;61(1):170-3 [PMID: 21963610]
  12. Bioinformatics. 2014 May 1;30(9):1312-3 [PMID: 24451623]
  13. Mol Ecol. 2018 Oct;27(19):3852-3872 [PMID: 29569384]
  14. Proc Natl Acad Sci U S A. 2019 Aug 20;116(34):17115-17120 [PMID: 31387977]
  15. Evolution. 2005 Jan;59(1):24-37 [PMID: 15792224]
  16. Theor Popul Biol. 2010 May;77(3):145-51 [PMID: 20064540]
  17. Genetics. 2016 Feb;202(2):775-86 [PMID: 26715666]
  18. Theor Popul Biol. 2013 Aug;87:51-61 [PMID: 23010245]
  19. Genetics. 2009 Sep;183(1):259-74 [PMID: 19581452]
  20. BMC Evol Biol. 2008 Mar 27;8:98 [PMID: 18371203]
  21. Heredity (Edinb). 2007 Apr;98(4):189-97 [PMID: 17389895]
  22. Methods Mol Biol. 2020;2090:167-189 [PMID: 31975168]
  23. Genetics. 2017 Feb;205(2):857-870 [PMID: 28007885]
  24. Genome Biol. 2016 Feb 27;17:25 [PMID: 26921238]
  25. Nat Genet. 2014 Aug;46(8):919-25 [PMID: 24952747]
  26. Cell Rep. 2013 Nov 14;5(3):666-77 [PMID: 24183670]
  27. Genetics. 2013 Jul;194(3):647-62 [PMID: 23608192]
  28. Bioinformatics. 2002 Feb;18(2):337-8 [PMID: 11847089]
  29. Mol Biol Evol. 2015 Jan;32(1):239-43 [PMID: 25371432]
  30. PLoS Genet. 2011 Mar;7(3):e1001319 [PMID: 21408205]
  31. Mol Phylogenet Evol. 2016 Jan;94(Pt A):447-62 [PMID: 26518740]
  32. Genome Res. 2009 Nov;19(11):1929-41 [PMID: 19801602]
  33. BMC Bioinformatics. 2008 Jul 28;9:322 [PMID: 18662388]
  34. J Math Biol. 2019 Jan;78(1-2):155-188 [PMID: 30116881]
  35. Nature. 2011 Jul 13;475(7357):493-6 [PMID: 21753753]

MeSH Term

Animals
Computer Simulation
Genetics, Population
Humans
Models, Genetic
Population Density
Recombination, Genetic

Word Cloud

Created with Highcharts 10.0.0methodmethodspopulationcoalHMMaccuracycoalescentusinginferencetransitionrateslocalgenealogiessimulationtaxadivide-and-conquerapproachdataCoalescentprovenpowerfultoolsgeneticsphylogeneticsepidemiologyfieldspromisingavenueanalysislargegenomicalignmentsincreasinglycommonhiddenMarkovmodellackedgeneralusabilityflexibilityintroducenovelautomaticallylearninginferringposteriordistributionsevolutionaryparametersblack-boxvariationalderivedempiricallyderivationenablesworkdirectlythreefourUsingsimulatedsetresemblinghuman-chimp-gorillascenarioshowcomparablebetterpreviousspeciesdivergencetimessizesaccuratelyinferredalsoinfersreportFurthermorediscusspotentialdirectionscalinglargersetsmeansusefulnowderivingflexibleenoughenablefutureimplementationsvariousmodelsVariationalapproximatelikelihoodrecombination

Similar Articles

Cited By