Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks.

Don Klinkenberg, Jantien A Backer, Xavier Didelot, Caroline Colijn, Jacco Wallinga
Author Information
  1. Don Klinkenberg: Department of Epidemiology and Surveillance, National Institute for Public Health and the Environment, Bilthoven, The Netherlands. ORCID
  2. Jantien A Backer: Department of Epidemiology and Surveillance, National Institute for Public Health and the Environment, Bilthoven, The Netherlands. ORCID
  3. Xavier Didelot: Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom. ORCID
  4. Caroline Colijn: Department of Mathematics, Imperial College London, London, United Kingdom. ORCID
  5. Jacco Wallinga: Department of Epidemiology and Surveillance, National Institute for Public Health and the Environment, Bilthoven, The Netherlands. ORCID

Abstract

Whole-genome sequencing of pathogens from host samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events which can be used for further epidemiologic analyses, such as identification of risk factors for infectivity and transmission. However, the relationship between transmission events and sequence data is obscured by uncertainty arising from four largely unobserved processes: transmission, case observation, within-host pathogen dynamics and mutation. To properly resolve transmission events, these processes need to be taken into account. Recent years have seen much progress in theory and method development, but existing applications make simplifying assumptions that often break up the dependency between the four processes, or are tailored to specific datasets with matching model assumptions and code. To obtain a method with wider applicability, we have developed a novel approach to reconstruct transmission trees with sequence data. Our approach combines elementary models for transmission, case observation, within-host pathogen dynamics, and mutation, under the assumption that the outbreak is over and all cases have been observed. We use Bayesian inference with MCMC for which we have designed novel proposal steps to efficiently traverse the posterior distribution, taking account of all unobserved processes at once. This allows for efficient sampling of transmission trees from the posterior distribution, and robust estimation of consensus transmission trees. We implemented the proposed method in a new R package phybreak. The method performs well in tests of both new and published simulated data. We apply the model to five datasets on densely sampled infectious disease outbreaks, covering a wide range of epidemiological settings. Using only sampling times and sequences as data, our analyses confirmed the original results or improved on them: the more realistic infection times place more confidence in the inferred transmission trees.

References

  1. J Mol Evol. 1981;17(6):368-76 [PMID: 7288891]
  2. Epidemics. 2012 Aug;4(3):158-69 [PMID: 22939313]
  3. Cold Spring Harb Perspect Med. 2012 Nov 01;2(11): [PMID: 23043157]
  4. J Clin Microbiol. 2015 Sep;53(9):2861-8 [PMID: 26135860]
  5. Proc Biol Sci. 2014 Nov 7;281(1794):20141324 [PMID: 25253455]
  6. Proc Biol Sci. 2012 Feb 7;279(1728):444-50 [PMID: 21733899]
  7. PLoS Comput Biol. 2015 Dec 30;11(12):e1004613 [PMID: 26717515]
  8. Clin Microbiol Rev. 2015 Jul;28(3):541-63 [PMID: 25876885]
  9. J Am Stat Assoc. 2015 Mar 1;110(509):313-325 [PMID: 26146425]
  10. PLoS Comput Biol. 2012;8(11):e1002768 [PMID: 23166481]
  11. Proc Natl Acad Sci U S A. 2012 Mar 20;109(12):4550-5 [PMID: 22393007]
  12. Mol Biol Evol. 2017 Apr 1;34(4):997-1007 [PMID: 28100788]
  13. Cell Host Microbe. 2014 Nov 12;16(5):691-700 [PMID: 25456074]
  14. PLoS Comput Biol. 2013;9(3):e1002947 [PMID: 23555203]
  15. Proc Biol Sci. 2008 Apr 22;275(1637):887-95 [PMID: 18230598]
  16. Trends Ecol Evol. 2015 Jun;30(6):306-13 [PMID: 25887947]
  17. Lancet Infect Dis. 2013 Feb;13(2):137-46 [PMID: 23158499]
  18. Ann Appl Stat. 2016 Mar;10(1):395-417 [PMID: 27042253]
  19. Trends Genet. 2014 Sep;30(9):401-7 [PMID: 25096945]
  20. Antimicrob Agents Chemother. 2014 Dec;58(12):7347-57 [PMID: 25267672]
  21. PLoS Comput Biol. 2014 Jan;10(1):e1003457 [PMID: 24465202]
  22. PLoS Pathog. 2011 Jun;7(6):e1002094 [PMID: 21731491]
  23. PLoS Comput Biol. 2016 Apr 12;12(4):e1004869 [PMID: 27070316]
  24. PLoS One. 2015 Dec 02;10(12):e0143605 [PMID: 26630483]
  25. Genetics. 2014 Dec;198(4):1395-404 [PMID: 25313129]
  26. PLoS Biol. 2006 May;4(5):e88 [PMID: 16683862]
  27. Antimicrob Agents Chemother. 2015 Dec 07;60(3):1249-57 [PMID: 26643351]
  28. PLoS Pathog. 2010 Apr 08;6(4):e1000855 [PMID: 20386717]
  29. PLoS Comput Biol. 2015 Nov 23;11(11):e1004633 [PMID: 26599399]
  30. PLoS Comput Biol. 2016 Sep 28;12(9):e1005130 [PMID: 27681228]
  31. Genetics. 2013 Nov;195(3):1055-62 [PMID: 24037268]
  32. Mol Biol Evol. 2012 Aug;29(8):1969-73 [PMID: 22367748]
  33. PLoS Comput Biol. 2007 Apr 20;3(4):e71 [PMID: 17447838]
  34. Am J Epidemiol. 2013 Oct 15;178(8):1281-8 [PMID: 23880353]
  35. Nat Genet. 2011 May;43(5):482-6 [PMID: 21516081]
  36. Syst Biol. 2017 Jan 1;66(1):e47-e65 [PMID: 28173504]
  37. Mol Biol Evol. 2006 Dec;23(12):2336-41 [PMID: 16945980]
  38. Genome Biol Evol. 2016 Aug 16;8(8):2319-32 [PMID: 27435794]
  39. PLoS Pathog. 2008 Apr 18;4(4):e1000050 [PMID: 18421380]
  40. J Infect Dis. 2013 Mar 1;207(5):730-5 [PMID: 23230058]
  41. Mol Biol Evol. 2014 Jul;31(7):1869-79 [PMID: 24714079]
  42. Proc Biol Sci. 2014 Mar 11;281(1782):20133251 [PMID: 24619442]
  43. PLoS One. 2013;8(1):e54898 [PMID: 23382995]
  44. Nat Rev Genet. 2009 Aug;10(8):540-50 [PMID: 19564871]

Grants

  1. MR/N010760/1/Medical Research Council
  2. MR/ N010760/1/Medical Research Council

MeSH Term

Algorithms
Bacteria
Bacterial Infections
Computational Biology
Disease Transmission, Infectious
Genome, Bacterial
Genome, Viral
Humans
Phylogeny
Polymorphism, Single Nucleotide
Virus Diseases
Viruses

Word Cloud

Created with Highcharts 10.0.0transmissiondatatreesmethodinfectiousdiseaseoutbreakseventsprocessesanalysessequencefourunobservedcaseobservationwithin-hostpathogendynamicsmutationaccountassumptionsdatasetsmodelnovelapproachinferenceposteriordistributionsamplingnewtimesWhole-genomesequencingpathogenshostsamplesbecomesroutineprovideinformationpossiblecanusedepidemiologicidentificationriskfactorsinfectivityHoweverrelationshipobscureduncertaintyarisinglargelyprocesses:properlyresolveneedtakenRecentyearsseenmuchprogresstheorydevelopmentexistingapplicationsmakesimplifyingoftenbreakdependencytailoredspecificmatchingcodeobtainwiderapplicabilitydevelopedreconstructcombineselementarymodelsassumptionoutbreakcasesobserveduseBayesianMCMCdesignedproposalstepsefficientlytraversetakingallowsefficientrobustestimationconsensusimplementedproposedRpackagephybreakperformswelltestspublishedsimulatedapplyfivedenselysampledcoveringwiderangeepidemiologicalsettingsUsingsequencesconfirmedoriginalresultsimprovedthem:realisticinfectionplaceconfidenceinferredSimultaneousphylogenetic

Similar Articles

Cited By