Pareto Optimization of Combinatorial Mutagenesis Libraries.

Deeptak Verma, Gevorg Grigoryan, Chris Bailey-Kellogg
Author Information

Abstract

In order to increase the hit rate of discovering diverse, beneficial protein variants via high-throughput screening, we have developed a computational method to optimize combinatorial mutagenesis libraries for overall enrichment in two distinct properties of interest. Given scoring functions for evaluating individual variants, POCoM (Pareto Optimal Combinatorial Mutagenesis) scores entire libraries in terms of averages over their constituent members, and designs optimal libraries as sets of mutations whose combinations make the best trade-offs between average scores. This represents the first general-purpose method to directly design combinatorial libraries for multiple objectives characterizing their constituent members. Despite being rigorous in mapping out the Pareto frontier, it is also very fast even for very large libraries (e.g., designing 30 mutation, billion-member libraries in only hours). We here instantiate POCoM with scores based on a target's protein structure and its homologs' sequences, enabling the design of libraries containing variants balancing these two important yet quite different types of information. We demonstrate POCoM's generality and power in case study applications to green fluorescent protein, cytochrome P450, and β-lactamase. Analysis of the POCoM library designs provides insights into the trade-offs between structure- and sequence-based scores, as well as the impacts of experimental constraints on library designs. POCoM libraries incorporate mutations that have previously been found favorable experimentally, while diversifying the contexts in which these mutations are situated and maintaining overall variant quality.

References

  1. Nature. 2003 Oct 16;425(6959):686-91 [PMID: 14562095]
  2. Proc Natl Acad Sci U S A. 2009 Jul 21;106(29):11895-900 [PMID: 19574456]
  3. Protein Sci. 2009 Jan;18(1):147-60 [PMID: 19177359]
  4. Antimicrob Agents Chemother. 2002 May;46(5):1183-9 [PMID: 11959543]
  5. J Comput Biol. 2011 Nov;18(11):1743-56 [PMID: 21923411]
  6. Proc Natl Acad Sci U S A. 2000 Mar 28;97(7):3718-23 [PMID: 10737809]
  7. Chem Res Toxicol. 2008 Jan;21(1):70-83 [PMID: 18052394]
  8. Chem Biol. 2015 May 21;22(5):629-39 [PMID: 26000749]
  9. Proc Natl Acad Sci U S A. 1994 Dec 20;91(26):12501-4 [PMID: 7809066]
  10. Biophys J. 2006 Jun 1;90(11):4167-80 [PMID: 16513775]
  11. J Comput Biol. 2007 Jul-Aug;14(6):777-90 [PMID: 17691894]
  12. Proc Natl Acad Sci U S A. 2007 Jan 2;104(1):48-53 [PMID: 17179210]
  13. J Comput Biol. 2009 Aug;16(8):1151-68 [PMID: 19645597]
  14. Proteins. 2012 Mar;80(3):790-806 [PMID: 22180081]
  15. Curr Opin Biotechnol. 2005 Aug;16(4):378-84 [PMID: 15994074]
  16. Proc Natl Acad Sci U S A. 2013 Sep 24;110(39):15674-9 [PMID: 24009338]
  17. Methods Mol Biol. 2012;796:385-98 [PMID: 22052502]
  18. Proc Natl Acad Sci U S A. 2011 Dec 6;108(49):E1293-301 [PMID: 22106262]
  19. Protein Eng Des Sel. 2005 Dec;18(12):559-61 [PMID: 16239261]
  20. J Bacteriol. 1992 Aug;174(16):5237-43 [PMID: 1644749]
  21. Nucleic Acids Res. 2016 Jul 8;44(W1):W479-87 [PMID: 27174934]
  22. Curr Opin Chem Biol. 2009 Feb;13(1):19-25 [PMID: 19261539]
  23. J Cell Biol. 1999 Jul 12;146(1):29-44 [PMID: 10402458]
  24. Proc Natl Acad Sci U S A. 2006 Mar 21;103(12):4753-8 [PMID: 16537386]
  25. Biochem J. 1998 Mar 1;330 ( Pt 2):581-98 [PMID: 9480862]
  26. Methods Enzymol. 2004;383:66-93 [PMID: 15063647]
  27. Methods Mol Biol. 2017;1529:375-398 [PMID: 27914063]
  28. Biotechnol Bioeng. 2015 Jul;112(7):1306-18 [PMID: 25655032]
  29. Protein Sci. 2015 May;24(5):895-908 [PMID: 25611189]
  30. Nat Rev Mol Cell Biol. 2002 Dec;3(12):906-18 [PMID: 12461557]
  31. J Biol Chem. 2015 Apr 17;290(16):10382-94 [PMID: 25713062]
  32. Proc Natl Acad Sci U S A. 2001 Mar 27;98(7):3778-83 [PMID: 11274394]
  33. PLoS Comput Biol. 2015 Jan 08;11(1):e1003988 [PMID: 25568954]
  34. Acta Pathol Microbiol Scand. 1954;34(2):182-90 [PMID: 13138216]
  35. Biochemistry. 2005 Jul 5;44(26):9330-8 [PMID: 15981999]
  36. J Comput Chem. 2010 Dec;31(16):2900-14 [PMID: 20602445]
  37. Protein Eng Des Sel. 2007 Aug;20(8):361-73 [PMID: 17686879]
  38. J Comput Biol. 2010 Mar;17(3):459-75 [PMID: 20377457]
  39. Protein Eng. 2002 Oct;15(10):779-82 [PMID: 12468711]
  40. ACS Chem Biol. 2010 Jun 18;5(6):553-62 [PMID: 20038141]
  41. J Comput Biol. 2013 Feb;20(2):152-65 [PMID: 23384000]
  42. Proc Natl Acad Sci U S A. 1986 Mar;83(6):1588-92 [PMID: 3513181]
  43. Curr Opin Chem Biol. 2007 Jun;11(3):329-34 [PMID: 17524729]
  44. Methods Mol Biol. 2016;1414:99-138 [PMID: 27094288]
  45. PLoS One. 2012;7(10):e46962 [PMID: 23118864]
  46. Nature. 2009 Apr 16;458(7240):859-64 [PMID: 19370028]
  47. Cell Mol Life Sci. 2014 Dec;71(24):4869-80 [PMID: 24880662]
  48. Nat Struct Biol. 2003 Jan;10(1):59-69 [PMID: 12483203]
  49. Phys Rev Lett. 2005 Sep 30;95(14):148103 [PMID: 16241695]
  50. PLoS Comput Biol. 2006 Jun 16;2(6):e63 [PMID: 16789811]
  51. Proc Natl Acad Sci U S A. 2010 Nov 23;107(47):20257-62 [PMID: 21059931]
  52. FASEB J. 2005 Mar;19(3):440-2 [PMID: 15640280]
  53. Proc Natl Acad Sci U S A. 2002 Dec 10;99(25):15926-31 [PMID: 12446841]
  54. Methods Mol Biol. 2010;634:103-9 [PMID: 20676978]
  55. J Mol Recognit. 2007 Sep-Oct;20(5):367-78 [PMID: 17918771]
  56. Pharmacol Ther. 2006 Dec;112(3):761-73 [PMID: 16872679]
  57. PLoS Comput Biol. 2013;9(8):e1003176 [PMID: 23990764]
  58. J Bioinform Comput Biol. 2011 Apr;9(2):207-29 [PMID: 21523929]
  59. Elife. 2014 Sep 25;3: [PMID: 25255213]
  60. PLoS One. 2011;6(12):e28766 [PMID: 22163331]
  61. Biophys Chem. 2010 Mar;147(1-2):13-9 [PMID: 20034725]
  62. Proc Natl Acad Sci U S A. 2017 Jun 27;114(26):E5085-E5093 [PMID: 28607051]
  63. Expert Opin Drug Metab Toxicol. 2010 Feb;6(2):115-31 [PMID: 20064075]
  64. BMC Bioinformatics. 2014 Mar 26;15:85 [PMID: 24669753]

Grants

  1. R01 GM098977/NIGMS NIH HHS

MeSH Term

Algorithms
Computational Biology
Cytochrome P-450 Enzyme System
Gene Library
Green Fluorescent Proteins
Models, Molecular
Mutagenesis
Mutation
Oligonucleotides
Programming Languages
Protein Engineering
Proteins
Software
beta-Lactamases

Chemicals

Oligonucleotides
Proteins
Green Fluorescent Proteins
Cytochrome P-450 Enzyme System
beta-Lactamases

Word Cloud

Created with Highcharts 10.0.0librariesPOCoMscoresproteinvariantsParetodesignsmutationsmethodcombinatorialoveralltwoCombinatorialMutagenesisconstituentmemberstrade-offsdesignlibraryorderincreasehitratediscoveringdiversebeneficialviahigh-throughputscreeningdevelopedcomputationaloptimizemutagenesisenrichmentdistinctpropertiesinterestGivenscoringfunctionsevaluatingindividualOptimalentiretermsaveragesoptimalsetswhosecombinationsmakebestaveragerepresentsfirstgeneral-purposedirectlymultipleobjectivescharacterizingDespiterigorousmappingfrontieralsofastevenlargeegdesigning30mutationbillion-memberhoursinstantiatebasedtarget'sstructurehomologs'sequencesenablingcontainingbalancingimportantyetquitedifferenttypesinformationdemonstratePOCoM'sgeneralitypowercasestudyapplicationsgreenfluorescentcytochromeP450β-lactamaseAnalysisprovidesinsightsstructure-sequence-basedwellimpactsexperimentalconstraintsincorporatepreviouslyfoundfavorableexperimentallydiversifyingcontextssituatedmaintainingvariantqualityOptimizationLibraries

Similar Articles

Cited By