Phylogenetic Gaussian process model for the inference of functionally important regions in protein tertiary structures.

Yi-Fei Huang, G Brian Golding
Author Information
  1. Yi-Fei Huang: Department of Biology, McMaster University, Hamilton, Ontario, Canada.
  2. G Brian Golding: Department of Biology, McMaster University, Hamilton, Ontario, Canada.

Abstract

A critical question in biology is the identification of functionally important amino acid sites in proteins. Because functionally important sites are under stronger purifying selection, site-specific substitution rates tend to be lower than usual at these sites. A large number of phylogenetic models have been developed to estimate site-specific substitution rates in proteins and the extraordinarily low substitution rates have been used as evidence of function. Most of the existing tools, e.g. Rate4Site, assume that site-specific substitution rates are independent across sites. However, site-specific substitution rates may be strongly correlated in the protein tertiary structure, since functionally important sites tend to be clustered together to form functional patches. We have developed a new model, GP4Rate, which incorporates the Gaussian process model with the standard phylogenetic model to identify slowly evolved regions in protein tertiary structures. GP4Rate uses the Gaussian process to define a nonparametric prior distribution of site-specific substitution rates, which naturally captures the spatial correlation of substitution rates. Simulations suggest that GP4Rate can potentially estimate site-specific substitution rates with a much higher accuracy than Rate4Site and tends to report slowly evolved regions rather than individual sites. In addition, GP4Rate can estimate the strength of the spatial correlation of substitution rates from the data. By applying GP4Rate to a set of mammalian B7-1 genes, we found a highly conserved region which coincides with experimental evidence. GP4Rate may be a useful tool for the in silico prediction of functionally important regions in the proteins with known structures.

References

  1. Mol Biol Evol. 1996 Jan;13(1):93-104 [PMID: 8583911]
  2. J Mol Evol. 1981;17(6):368-76 [PMID: 7288891]
  3. Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W382-4 [PMID: 16845032]
  4. Bioinformatics. 2007 Jul 1;23(13):i319-27 [PMID: 17646313]
  5. Nature. 2001 Mar 29;410(6828):608-11 [PMID: 11279502]
  6. Genome Res. 2005 Aug;15(8):1034-50 [PMID: 16024819]
  7. Mol Biol Evol. 2013 Mar;30(3):725-36 [PMID: 23188590]
  8. Mol Biol Evol. 2013 Dec;30(12):2714-22 [PMID: 24002809]
  9. Mol Biol Evol. 2004 Dec;21(12):2352-9 [PMID: 15356273]
  10. BMC Bioinformatics. 2006 Apr 04;7:188 [PMID: 16594991]
  11. J Comput Biol. 2004;11(2-3):413-28 [PMID: 15285899]
  12. Mol Biol Evol. 2012 Aug;29(8):2063-71 [PMID: 22427709]
  13. Bioinformatics. 2003 Jan;19(1):163-4 [PMID: 12499312]
  14. Nucleic Acids Res. 2009 Jan;37(Database issue):D323-7 [PMID: 18971256]
  15. Philos Trans R Soc Lond B Biol Sci. 2013 Feb 04;368(1614):20120334 [PMID: 23382434]
  16. J Biol Chem. 1995 Sep 8;270(36):21181-7 [PMID: 7545666]
  17. Nucleic Acids Res. 2012 Jan;40(Database issue):D13-25 [PMID: 22140104]
  18. J Mol Biol. 2004 Apr 2;337(4):1053-68 [PMID: 15033369]
  19. Trends Genet. 2011 Sep;27(9):377-86 [PMID: 21764165]
  20. Gene. 2005 Mar 14;347(2):207-17 [PMID: 15733531]
  21. Mol Biol Evol. 2013 Aug;30(8):1745-50 [PMID: 23699471]
  22. Mol Biol Evol. 2003 Oct;20(10):1692-704 [PMID: 12885968]
  23. J Mol Biol. 2002 Feb 8;316(1):139-54 [PMID: 11829509]
  24. Mol Biol Evol. 2004 Sep;21(9):1781-91 [PMID: 15201400]
  25. Mol Biol Evol. 2007 Aug;24(8):1586-91 [PMID: 17483113]
  26. PLoS Comput Biol. 2009 Jun;5(6):e1000421 [PMID: 19557160]
  27. Bioinformatics. 2012 Jan 15;28(2):176-83 [PMID: 22121158]
  28. Mol Biol Evol. 2013 Jan;30(1):36-44 [PMID: 22977116]
  29. Mol Biol Evol. 2009 May;26(5):1155-61 [PMID: 19233963]
  30. Mol Biol Evol. 2006 Sep;23(9):1762-75 [PMID: 16787998]
  31. J Mol Biol. 1996 Mar 29;257(2):342-58 [PMID: 8609628]
  32. J Mol Evol. 1994 Sep;39(3):306-14 [PMID: 7932792]
  33. Bioinformatics. 2005 Oct 15;21(20):3940-1 [PMID: 16096348]
  34. J Mol Evol. 2005 Apr;60(4):499-504 [PMID: 15883884]
  35. Syst Biol. 2003 Oct;52(5):696-704 [PMID: 14530136]
  36. Proc Natl Acad Sci U S A. 2002 Mar 5;99(5):2912-7 [PMID: 11880638]
  37. Pac Symp Biocomput. 2000;:6-17 [PMID: 10902152]
  38. Nucleic Acids Res. 2010 Jul;38(Web Server issue):W529-33 [PMID: 20478830]
  39. Syst Biol. 2006 Apr;55(2):195-207 [PMID: 16522570]
  40. Genome Biol. 2005;6(6):223 [PMID: 15960813]
  41. Genetics. 1995 Feb;139(2):993-1005 [PMID: 7713447]
  42. Mol Biol Evol. 2005 Feb;22(2):193-9 [PMID: 15483331]
  43. Syst Biol. 2007 Dec;56(6):975-87 [PMID: 18075934]
  44. Proc Natl Acad Sci U S A. 2008 Sep 9;105(36):13480-5 [PMID: 18768804]
  45. Nature. 1996 May 23;381(6580):335-41 [PMID: 8692274]
  46. Immunity. 2000 Jan;12(1):51-60 [PMID: 10661405]
  47. Bioinformatics. 2005 Jun;21 Suppl 1:i328-37 [PMID: 15961475]
  48. Bioinformatics. 2007 Aug 1;23(15):1875-82 [PMID: 17519246]
  49. Comput Appl Biosci. 1992 Jun;8(3):275-82 [PMID: 1633570]

MeSH Term

Algorithms
B7-1 Antigen
Computational Biology
Computer Simulation
Humans
Normal Distribution
Phylogeny
Protein Structure, Tertiary
Proteins
ROC Curve
Reproducibility of Results
Software

Chemicals

B7-1 Antigen
Proteins

Word Cloud

Created with Highcharts 10.0.0substitutionratessitessite-specificGP4RatefunctionallyimportantmodelregionsproteinsestimateproteintertiaryGaussianprocessstructurestendphylogeneticdevelopedevidenceRate4SitemayslowlyevolvedspatialcorrelationcancriticalquestionbiologyidentificationaminoacidstrongerpurifyingselectionlowerusuallargenumbermodelsextraordinarilylowusedfunctionexistingtoolsegassumeindependentacrossHoweverstronglycorrelatedstructuresinceclusteredtogetherformfunctionalpatchesnewincorporatesstandardidentifyusesdefinenonparametricpriordistributionnaturallycapturesSimulationssuggestpotentiallymuchhigheraccuracytendsreportratherindividualadditionstrengthdataapplyingsetmammalianB7-1genesfoundhighlyconservedregioncoincidesexperimentalusefultoolsilicopredictionknownPhylogeneticinference

Similar Articles

Cited By