Cladograms with Path to Event (ClaPTE): a novel algorithm to detect associations between genotypes or phenotypes using phylogenies.

Samuel K Handelman, Jacob M Aaronson, Michal Seweryn, Igor Voronkin, Jesse J Kwiek, Wolfgang Sadee, Joseph S Verducci, Daniel A Janies
Author Information
  1. Samuel K Handelman: Department of Pharmacology, Ohio State University College of Medicine, 5072 Graves Hall, 333 West 10th Avenue, Columbus, OH 43210, United States; Mathematical Biosciences Institute, The Ohio State University, Jennings Hall 3rd Floor, 1735 Neil Avenue, Columbus, OH 43210, United States. Electronic address: handelman.9@osu.edu.
  2. Jacob M Aaronson: Department of Biomedical Informatics, Ohio State University College of Medicine, 3190 Graves Hall, 333 West 10th Avenue, Columbus, OH 43210, United States.
  3. Michal Seweryn: Mathematical Biosciences Institute, The Ohio State University, Jennings Hall 3rd Floor, 1735 Neil Avenue, Columbus, OH 43210, United States.
  4. Igor Voronkin: Department of Biomedical Informatics, Ohio State University College of Medicine, 3190 Graves Hall, 333 West 10th Avenue, Columbus, OH 43210, United States.
  5. Jesse J Kwiek: Department of Microbial Infection & Immunity and Department of Microbiology, The Ohio State University, 788 Biomedical Research Tower, 460 West 12th Avenue, Columbus, OH 43210, United States.
  6. Wolfgang Sadee: Department of Pharmacology, Ohio State University College of Medicine, 5072 Graves Hall, 333 West 10th Avenue, Columbus, OH 43210, United States.
  7. Joseph S Verducci: Department of Statistics, The Ohio State University, 404 Cockins Hall, 1958 Neil Avenue, Columbus, OH 43210-1247, United States.
  8. Daniel A Janies: Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC 28223-0001, United States.

Abstract

BACKGROUND: Associations between genotype and phenotype provide insight into the evolution of pathogenesis, drug resistance, and the spread of pathogens between hosts. However, common ancestry can lead to apparent associations between biologically unrelated features. The novel method Cladograms with Path to Event (ClaPTE) detects associations between character-pairs (either a pair of mutations or a mutation paired with a phenotype) while adjusting for common ancestry, using phylogenetic trees.
METHODS: ClaPTE tests for character-pairs changing close together on the phylogenetic tree, consistent with an associated character-pair. ClaPTE is compared to three existing methods (independent contrasts, mixed model, and likelihood ratio) to detect character-pair associations adjusted for common ancestry. Comparisons utilize simulations on gene trees for: HIV Env, HIV promoter, and bacterial DnaJ and GuaB; and case studies for Oseltamavir resistance in Influenza, and for DnaJ and GuaB. Simulated data include both true-positive/associated character-pairs, and true-negative/not-associated character-pairs, used to assess type I (frequency of p-values in true-negatives) and type II (sensitivity to true-positives) error control.
RESULTS AND CONCLUSIONS: ClaPTE has competitive sensitivity and better type I error control than existing methods. In the Influenza/Oseltamavir case study, ClaPTE reports no new permissive mutations but detects associations between adjacent (in primary sequence) amino acid positions which other methods miss. In the DnaJ and GuaB case study, ClaPTE reports more frequent associations between positions both from the same protein family than between positions from different families, in contrast to other methods. In both case studies, the results from ClaPTE are biologically plausible.

Keywords

References

  1. J Comput Biol. 2000;7(3-4):601-20 [PMID: 11108481]
  2. Syst Biol. 2007 Jun;56(3):485-95 [PMID: 17562472]
  3. Mol Biol Evol. 2010 Apr;27(4):819-32 [PMID: 19955476]
  4. Evolution. 2010 Jul;64(7):1885-98 [PMID: 20100217]
  5. Mol Biol Evol. 2005 Mar;22(3):478-85 [PMID: 15509724]
  6. PLoS Comput Biol. 2010 Jan;6(1):e1000633 [PMID: 20052271]
  7. J Virol. 2008 Jan;82(2):596-601 [PMID: 17942553]
  8. Nucleic Acids Res. 1988 Aug 11;16(15):7351-67 [PMID: 3045756]
  9. Syst Biol. 1999 Mar;48(1):170-91 [PMID: 12078639]
  10. Am J Hum Genet. 2010 Jan;86(1):6-22 [PMID: 20074509]
  11. J Virol. 2012 Oct;86(19):10651-60 [PMID: 22837199]
  12. Microbiol Mol Biol Rev. 2009 Sep;73(3):451-80, Table of Contents [PMID: 19721086]
  13. J Comput Biol. 2011 Mar;18(3):263-81 [PMID: 21385033]
  14. Bioinformatics. 2003 Jan 22;19(2):301-2 [PMID: 12538260]
  15. J Mol Biol. 1997 Aug 29;271(4):511-23 [PMID: 9281423]
  16. Genetics. 1990 Sep;126(1):249-60 [PMID: 2227384]
  17. Elife. 2013 May 14;2:e00631 [PMID: 23682315]
  18. Proc Natl Acad Sci U S A. 1998 Apr 14;95(8):4368-73 [PMID: 9539743]
  19. J Theor Biol. 1978 Jan 20;70(2):213-28 [PMID: 633917]
  20. PLoS Comput Biol. 2010 May 27;6(5):e1000792 [PMID: 20523739]
  21. Mol Syst Biol. 2006;2:2006.0008 [PMID: 16738554]
  22. J Theor Biol. 2002 Sep 21;218(2):175-85 [PMID: 12381290]
  23. Emerg Infect Dis. 2011 Apr;17(4):653-60; quiz 765 [PMID: 21470455]
  24. Syst Biol. 2004 Oct;53(5):673-84 [PMID: 15545248]
  25. Comput Biol Med. 2014 May;48:17-27 [PMID: 24637144]
  26. Bioinformatics. 2007 Oct 1;23(19):2633-5 [PMID: 17586829]
  27. Proteins. 2006 Jun 1;63(4):832-45 [PMID: 16508975]
  28. Comput Biol Med. 2013 Aug 1;43(7):817-21 [PMID: 23746722]
  29. Syst Biol. 2002 Aug;51(4):588-98 [PMID: 12228001]
  30. Am J Hum Genet. 1973 Sep;25(5):471-92 [PMID: 4741844]
  31. Science. 1994 Sep 30;265(5181):2037-48 [PMID: 8091226]
  32. Physiol Biochem Zool. 2008 Sep-Oct;81(5):526-50 [PMID: 18754728]
  33. Nat Methods. 2011 Sep 04;8(10):833-5 [PMID: 21892150]
  34. PLoS Biol. 2006 Mar;4(3):e72 [PMID: 16494531]
  35. Bioinformatics. 2008 Jan 1;24(1):129-31 [PMID: 18006550]
  36. Science. 2010 Jun 4;328(5983):1272-5 [PMID: 20522774]
  37. Science. 2004 Jan 16;303(5656):327-32 [PMID: 14726583]
  38. Cladistics. 2010 Feb;26(1):72-85 [PMID: 34875752]
  39. Nucleic Acids Res. 2009 Jan;37(Database issue):D216-23 [PMID: 18940865]
  40. BMC Bioinformatics. 2014 Feb 01;15:35 [PMID: 24484323]
  41. Bioinformatics. 2011 Jan 1;27(1):95-102 [PMID: 21045073]
  42. J Virol. 2011 Jul;85(14):7142-52 [PMID: 21543508]
  43. J Virol. 2004 Oct;78(20):11296-302 [PMID: 15452249]
  44. Bioinformatics. 2012 Dec 1;28(23):3144-6 [PMID: 23023983]
  45. Proc Natl Acad Sci U S A. 1999 Mar 30;96(7):3801-6 [PMID: 10097118]
  46. Bioinformatics. 2006 Dec 15;22(24):3096-8 [PMID: 17110367]
  47. Science. 1999 Oct 8;286(5438):295-9 [PMID: 10514373]
  48. Comput Biol Med. 2003 Sep;33(5):439-53 [PMID: 12860467]
  49. Bull Math Biol. 2007 Oct;69(7):2361-85 [PMID: 17554585]
  50. J Mol Biol. 2005 Sep 30;352(4):1002-15 [PMID: 16139301]
  51. EMBO J. 2008 Oct 22;27(20):2648-55 [PMID: 18818697]
  52. Curr Opin Struct Biol. 1996 Dec;6(6):830-7 [PMID: 8994884]
  53. Genetics. 1989 Nov;123(3):603-13 [PMID: 2599370]
  54. Am Nat. 2006 Jun;167(6):808-25 [PMID: 16685633]
  55. Am J Hum Genet. 2007 Sep;81(3):559-75 [PMID: 17701901]
  56. Appl Environ Microbiol. 2005 Jan;71(1):451-9 [PMID: 15640221]
  57. Hum Mutat. 2008 May;29(5):648-58 [PMID: 18286470]
  58. Bioinformatics. 2007 Apr 1;23(7):785-8 [PMID: 17267431]
  59. PLoS One. 2011;6(12):e28766 [PMID: 22163331]
  60. Bioinformatics. 2006 Nov 1;22(21):2688-90 [PMID: 16928733]
  61. Genome Res. 2010 Jan;20(1):122-32 [PMID: 19948819]
  62. Nat Genet. 2006 Aug;38(8):904-9 [PMID: 16862161]
  63. J Virol. 2003 Aug;77(15):8418-25 [PMID: 12857911]
  64. J Mol Biol. 2003 Mar 14;327(1):273-84 [PMID: 12614624]
  65. Comput Biol Med. 2014 Mar;46:22-8 [PMID: 24529202]
  66. Nature. 2009 Oct 8;461(7265):747-53 [PMID: 19812666]
  67. Int J Environ Res Public Health. 2009 Feb;6(2):678-93 [PMID: 19440409]
  68. J Psychopharmacol. 2006 Jul;20(4 Suppl):19-26 [PMID: 16785266]
  69. BMC Bioinformatics. 2014 Mar 26;15:85 [PMID: 24669753]
  70. Bioinformatics. 2004 Jan 22;20(2):289-90 [PMID: 14734327]
  71. Am Nat. 2019 Jun;193(6):755-772 [PMID: 31094602]
  72. Nat Genet. 2010 Jul;42(7):565-9 [PMID: 20562875]
  73. Brief Bioinform. 2008 Jan;9(1):46-56 [PMID: 18000015]
  74. Biometrics. 1998 Mar;54(1):209-18 [PMID: 9574966]
  75. Evolution. 1991 Aug;45(5):1184-1197 [PMID: 28564173]

Grants

  1. R00 HD056586/NICHD NIH HHS
  2. U01 GM092655/NIGMS NIH HHS
  3. R00HD05686/NICHD NIH HHS
  4. U01092655/PHS HHS

MeSH Term

Algorithms
Computational Biology
Evolution, Molecular
Genotype
Influenza A Virus, H1N1 Subtype
Models, Genetic
Phenotype
Phylogeny
Proteins

Chemicals

Proteins

Word Cloud

Created with Highcharts 10.0.0ClaPTEassociationsevolutioncharacter-pairsmethodscaseresistancecommonancestryHIVDnaJGuaBtypepositionsphenotypebiologicallynovelCladogramsPathEventdetectsmutationsusingphylogenetictreescharacter-pairexistingdetectstudiesInfluenzasensitivityerrorcontrolstudyreportsBACKGROUND:AssociationsgenotypeprovideinsightpathogenesisdrugspreadpathogenshostsHowevercanleadapparentunrelatedfeaturesmethodeitherpairmutationpairedadjustingMETHODS:testschangingclosetogethertreeconsistentassociatedcomparedthreeindependentcontrastsmixedmodellikelihoodratioadjustedComparisonsutilizesimulationsgenefor:EnvpromoterbacterialOseltamavirSimulateddataincludetrue-positive/associatedtrue-negative/not-associatedusedassessfrequencyp-valuestrue-negativesIItrue-positivesRESULTSANDCONCLUSIONS:competitivebetterInfluenza/Oseltamavirnewpermissiveadjacentprimarysequenceaminoacidmissfrequentproteinfamilydifferentfamiliescontrastresultsplausible:algorithmgenotypesphenotypesphylogeniesCorrelatedDrugGeneticsimulationGenotype-phenotypeassociationPhylogeneticsProtein

Similar Articles

Cited By