Optimized atomic statistical potentials: assessment of protein interfaces and loops.

Guang Qiang Dong, Hao Fan, Dina Schneidman-Duhovny, Ben Webb, Andrej Sali
Author Information
  1. Guang Qiang Dong: Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry and California Institute for Quantitative Biosciences (QB3), University of California, San Francisco, CA 94158, USA.

Abstract

MOTIVATION: Statistical potentials have been widely used for modeling whole proteins and their parts (e.g. sidechains and loops) as well as interactions between proteins, nucleic acids and small molecules. Here, we formulate the statistical potentials entirely within a statistical framework, avoiding questionable statistical mechanical assumptions and approximations, including a definition of the reference state.
RESULTS: We derive a general Bayesian framework for inferring statistically optimized atomic potentials (SOAP) in which the reference state is replaced with data-driven 'recovery' functions. Moreover, we restrain the relative orientation between two covalent bonds instead of a simple distance between two atoms, in an effort to capture orientation-dependent interactions such as hydrogen bonds. To demonstrate this general approach, we computed statistical potentials for protein-protein docking (SOAP-PP) and loop modeling (SOAP-Loop). For docking, a near-native model is within the top 10 scoring models in 40% of the PatchDock benchmark cases, compared with 23 and 27% for the state-of-the-art ZDOCK and FireDock scoring functions, respectively. Similarly, for modeling 12-residue loops in the PLOP benchmark, the average main-chain root mean square deviation of the best scored conformations by SOAP-Loop is 1.5 Å, close to the average root mean square deviation of the best sampled conformations (1.2 Å) and significantly better than that selected by Rosetta (2.1 Å), DFIRE (2.3 Å), DOPE (2.5 Å) and PLOP scoring functions (3.0 Å). Our Bayesian framework may also result in more accurate statistical potentials for additional modeling applications, thus affording better leverage of the experimentally determined protein structures.
AVAILABILITY AND IMPLEMENTATION: SOAP-PP and SOAP-Loop are available as part of MODELLER (http://salilab.org/modeller).

References

  1. Protein Sci. 2004 Feb;13(2):391-9 [PMID: 14739324]
  2. J Mol Biol. 1997 Mar 21;267(1):207-22 [PMID: 9096219]
  3. Curr Comput Aided Drug Des. 2010 Sep;6(3):197-206 [PMID: 20438443]
  4. Proc Natl Acad Sci U S A. 2003 Mar 18;100(6):3215-20 [PMID: 12631702]
  5. J Mol Biol. 1994 Feb 4;235(5):1598-613 [PMID: 8107094]
  6. Science. 2001 Oct 5;294(5540):93-6 [PMID: 11588250]
  7. J Chem Inf Model. 2010 Feb 22;50(2):262-73 [PMID: 20088605]
  8. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D302-5 [PMID: 16381872]
  9. J Mol Biol. 1999 Apr 9;287(4):797-815 [PMID: 10191147]
  10. Bioinformatics. 2012 Dec 15;28(24):3282-9 [PMID: 23093611]
  11. J Mol Biol. 1990 Nov 5;216(1):167-80 [PMID: 2121999]
  12. Nucleic Acids Res. 2008 Jul;36(12):3978-92 [PMID: 18515839]
  13. Proteins. 2005 Oct 1;61(1):44-55 [PMID: 16080157]
  14. J Chem Inf Model. 2011 Dec 27;51(12):3078-92 [PMID: 22014038]
  15. BMC Bioinformatics. 2010 Mar 12;11:128 [PMID: 20226048]
  16. J Mol Biol. 1996 Mar 1;256(3):623-44 [PMID: 8604144]
  17. BMC Struct Biol. 2004 Jun 18;4:8 [PMID: 15207004]
  18. Protein Sci. 2002 Feb;11(2):430-48 [PMID: 11790853]
  19. Science. 2012 Nov 23;338(6110):1042-6 [PMID: 23180855]
  20. J Mol Biol. 1993 Dec 5;234(3):779-815 [PMID: 8254673]
  21. BMC Bioinformatics. 2011 Jul 11;12:280 [PMID: 21745398]
  22. J Mol Biol. 1998 Feb 6;275(5):895-916 [PMID: 9480776]
  23. Future Med Chem. 2012 Aug;4(12):1619-44 [PMID: 22917249]
  24. Proteins. 1995 Oct;23(2):142-50 [PMID: 8592696]
  25. Neural Comput. 2002 Oct;14(10):2439-68 [PMID: 12396570]
  26. Protein Sci. 2006 Nov;15(11):2507-24 [PMID: 17075131]
  27. J Mol Biol. 1996 May 3;258(2):367-92 [PMID: 8627632]
  28. Structure. 2011 Jun 8;19(6):844-58 [PMID: 21645855]
  29. Bioinformatics. 2003 Dec 12;19(18):2500-1 [PMID: 14668246]
  30. J Mol Biol. 2003 May 23;329(1):159-74 [PMID: 12742025]
  31. J Mol Biol. 2005 Sep 30;352(4):986-1001 [PMID: 16126228]
  32. Proteins. 2000 Dec 1;41(4):518-34 [PMID: 11056039]
  33. Acta Crystallogr D Biol Crystallogr. 1999 Mar;55(Pt 3):583-601 [PMID: 10089455]
  34. Acta Crystallogr D Biol Crystallogr. 1999 Feb;55(Pt 2):473-8 [PMID: 10089358]
  35. Proteins. 1999 Jan 1;34(1):82-95 [PMID: 10336385]
  36. Curr Opin Struct Biol. 2011 Jun;21(3):382-90 [PMID: 21497504]
  37. J Mol Biol. 1990 Jun 20;213(4):859-83 [PMID: 2359125]
  38. PLoS One. 2010 Oct 27;5(10):e15386 [PMID: 21060880]
  39. Biophys J. 2011 Oct 19;101(8):2043-52 [PMID: 22004759]
  40. J Comput Aided Mol Des. 1993 Aug;7(4):473-501 [PMID: 8229096]
  41. Proteins. 2007 Jun 1;67(4):1078-86 [PMID: 17373710]
  42. Protein Sci. 2002 Nov;11(11):2714-26 [PMID: 12381853]
  43. J Comput Chem. 2010 Jan 15;31(1):133-43 [PMID: 19421996]
  44. Proteins. 2008 Sep;72(4):1171-88 [PMID: 18338384]
  45. Proteins. 2011;79 Suppl 10:1-5 [PMID: 21997831]
  46. J Phys Chem B. 2010 Feb 11;114(5):1859-69 [PMID: 20070091]
  47. J Mol Biol. 2004 Sep 10;342(2):635-49 [PMID: 15327961]
  48. Proteins. 2001 Aug 15;44(3):223-32 [PMID: 11455595]
  49. Protein Sci. 1997 Oct;6(10):2261-3 [PMID: 9336849]
  50. Nucleic Acids Res. 2011 Jan;39(Database issue):D411-9 [PMID: 21071423]
  51. Proteins. 2008 Apr;71(1):261-77 [PMID: 17932912]
  52. Biopolymers. 1983 Dec;22(12):2577-637 [PMID: 6667333]
  53. Proteins. 2008 Feb 15;70(3):950-70 [PMID: 17847088]
  54. Proteins. 1999 Jul 1;36(1):54-67 [PMID: 10373006]
  55. Proteins. 2009 Aug 15;76(3):718-30 [PMID: 19274740]
  56. Bioinformatics. 2012 Oct 15;28(20):2608-14 [PMID: 23053206]
  57. Biophys J. 2012 Nov 7;103(9):1950-9 [PMID: 23199923]
  58. Proteins. 2010 Nov 15;78(15):3104-10 [PMID: 20936681]
  59. Protein Sci. 2000 Sep;9(9):1753-73 [PMID: 11045621]
  60. Acta Crystallogr D Biol Crystallogr. 2000 Jun;56(Pt 6):714-21 [PMID: 10818348]
  61. Protein Eng. 1997 Aug;10(8):865-76 [PMID: 9415437]
  62. Structure. 2012 Jun 6;20(6):1118-26 [PMID: 22608968]
  63. J Mol Biol. 1997 Apr 25;268(1):209-25 [PMID: 9149153]
  64. J Chem Inf Model. 2013 Feb 25;53(2):500-8 [PMID: 23336295]
  65. Protein Sci. 1993 Sep;2(9):1511-9 [PMID: 8401235]
  66. Proc Natl Acad Sci U S A. 1975 Oct;72(10):3802-6 [PMID: 1060065]
  67. PLoS One. 2011;6(9):e24657 [PMID: 21949741]
  68. Biophys J. 2006 Jun 1;90(11):4010-7 [PMID: 16533849]
  69. J Mol Biol. 2008 Feb 8;376(1):288-301 [PMID: 18177896]
  70. Proteins. 2004 May 1;55(2):351-67 [PMID: 15048827]
  71. Biophys J. 2008 Nov 1;95(9):4217-27 [PMID: 18676649]
  72. Proteins. 2010 Nov 15;78(15):3065-249 [PMID: 21351380]
  73. Proteins. 2007 Oct 1;69(1):139-59 [PMID: 17598144]
  74. Cell Biochem Biophys. 2007;49(2):111-24 [PMID: 17906366]
  75. Protein Sci. 1999 Feb;8(2):361-9 [PMID: 10048329]
  76. Proteins. 2007 Dec 1;69(4):704-18 [PMID: 17918726]
  77. Curr Opin Struct Biol. 2013 Apr;23(2):191-7 [PMID: 23415854]

Grants

  1. R01 GM083960/NIGMS NIH HHS
  2. R01GM054762/NIGMS NIH HHS
  3. GM071790/NIGMS NIH HHS
  4. GM093342/NIGMS NIH HHS

MeSH Term

Bayes Theorem
Computational Biology
Hydrogen Bonding
Models, Statistical
Molecular Docking Simulation
Protein Conformation
Protein Interaction Domains and Motifs
Proteins
Software

Chemicals

Proteins

Word Cloud

Created with Highcharts 10.0.0statisticalÅpotentialsmodeling2loopsframeworkfunctionsSOAP-Loopscoring1proteinsinteractionswithinreferencestategeneralBayesianatomictwobondsdockingSOAP-PPbenchmarkPLOPaveragerootmeansquaredeviationbestconformations5better3proteinMOTIVATION:StatisticalwidelyusedwholepartsegsidechainswellnucleicacidssmallmoleculesformulateentirelyavoidingquestionablemechanicalassumptionsapproximationsincludingdefinitionRESULTS:deriveinferringstatisticallyoptimizedSOAPreplaceddata-driven'recovery'Moreoverrestrainrelativeorientationcovalentinsteadsimpledistanceatomseffortcaptureorientation-dependenthydrogendemonstrateapproachcomputedprotein-proteinloopnear-nativemodeltop10models40%PatchDockcasescompared2327%state-of-the-artZDOCKFireDockrespectivelySimilarly12-residuemain-chainscoredclosesampledsignificantlyselectedRosettaDFIREDOPE0mayalsoresultaccurateadditionalapplicationsthusaffordingleverageexperimentallydeterminedstructuresAVAILABILITYANDIMPLEMENTATION:availablepartMODELLERhttp://salilaborg/modellerOptimizedpotentials:assessmentinterfaces

Similar Articles

Cited By