A new parameter-rich structure-aware mechanistic model for amino acid substitution during evolution.

Peter B Chi, Dohyup Kim, Jason K Lai, Nadia Bykova, Claudia C Weber, Jan Kubelka, David A Liberles
Author Information
  1. Peter B Chi: Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, 19122.
  2. Dohyup Kim: Department of Molecular Biology, University of Wyoming, Laramie, Wyoming, 82071.
  3. Jason K Lai: Department of Molecular Biology, University of Wyoming, Laramie, Wyoming, 82071.
  4. Nadia Bykova: Department of Molecular Biology, University of Wyoming, Laramie, Wyoming, 82071.
  5. Claudia C Weber: Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, 19122.
  6. Jan Kubelka: Department of Chemistry, University of Wyoming, Laramie, Wyoming, 82071.
  7. David A Liberles: Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, 19122. ORCID

Abstract

Improvements in the description of amino acid substitution are required to develop better pseudo-energy-based protein structure-aware models for use in phylogenetic studies. These models are used to characterize the probabilities of amino acid substitution and enable better simulation of protein sequences over a phylogeny. A better characterization of amino acid substitution probabilities in turn enables numerous downstream applications, like detecting positive selection, ancestral sequence reconstruction, and evolutionarily-motivated protein engineering. Many existing Markov models for amino acid substitution in molecular evolution disregard molecular structure and describe the amino acid substitution process over longer evolutionary periods poorly. Here, we present a new model upgraded with a site-specific parameterization of pseudo-energy terms in a coarse-grained force field, which describes local heterogeneity in physical constraints on amino acid substitution better than a previous pseudo-energy-based model with minimum cost in runtime. The importance of each weight term parameterization in characterizing underlying features of the site, including contact number, solvent accessibility, and secondary structural elements was evaluated, returning both expected and biologically reasonable relationships between model parameters. This results in the acceptance of proposed amino acid substitutions that more closely resemble those observed site-specific frequencies in gene family alignments. The modular site-specific pseudo-energy function is made available for download through the following website: https://liberles.cst.temple.edu/Software/CASS/index.html.

Keywords

References

  1. Nat Rev Genet. 2015 Jul;16(7):409-20 [PMID: 26055156]
  2. Nucleic Acids Res. 2017 Jan 4;45(D1):D37-D42 [PMID: 27899564]
  3. Genome Biol Evol. 2013;5(10):2008-18 [PMID: 24115604]
  4. BMC Evol Biol. 2008 Sep 22;8:255 [PMID: 18808672]
  5. Mol Biol Evol. 1994 Sep;11(5):715-24 [PMID: 7968485]
  6. Mol Biol Evol. 1994 Sep;11(5):725-36 [PMID: 7968486]
  7. J Mol Biol. 1999 Sep 17;292(2):195-202 [PMID: 10493868]
  8. Proc Natl Acad Sci U S A. 1987 Nov;84(21):7524-8 [PMID: 3478708]
  9. Proc Natl Acad Sci U S A. 2012 May 22;109(21):E1352-9 [PMID: 22547823]
  10. Nucleic Acids Res. 2016 Jan 4;44(D1):D279-85 [PMID: 26673716]
  11. Annu Rev Biophys. 2017 May 22;46:85-103 [PMID: 28301766]
  12. Nucleic Acids Res. 2015 Jan;43(Database issue):D364-8 [PMID: 25352545]
  13. Biopolymers. 1983 Dec;22(12):2577-637 [PMID: 6667333]
  14. Protein Sci. 2016 Jul;25(7):1168-78 [PMID: 26808055]
  15. PLoS One. 2013 Nov 21;8(11):e80635 [PMID: 24278298]
  16. Genetics. 1997 Feb;145(2):505-18 [PMID: 9071603]
  17. Proc Natl Acad Sci U S A. 2000 Apr 11;97(8):3977-81 [PMID: 10760269]
  18. J Mol Evol. 2011 Aug;73(1-2):23-33 [PMID: 21800121]
  19. BMC Evol Biol. 2011 Dec 16;11:361 [PMID: 22171550]
  20. Mol Biol Evol. 2017 Jan;34(1):204-214 [PMID: 27744408]
  21. Mol Biol Evol. 2010 Jul;27(7):1546-60 [PMID: 20159780]
  22. BMC Evol Biol. 2013 Aug 01;13:161 [PMID: 23914788]
  23. J Mol Evol. 1994 Sep;39(3):306-14 [PMID: 7932792]
  24. Proc Natl Acad Sci U S A. 2010 Mar 9;107(10):4629-34 [PMID: 20176949]
  25. Nucleic Acids Res. 2017 Jan 4;45(D1):D271-D281 [PMID: 27794042]
  26. Biophys J. 1998 Jul;75(1):422-7 [PMID: 9649402]
  27. Mol Biol Evol. 1998 Jul;15(7):910-7 [PMID: 9656490]
  28. Mol Biol Evol. 2008 Jul;25(7):1307-20 [PMID: 18367465]
  29. Genetics. 2012 Mar;190(3):1101-15 [PMID: 22209901]
  30. Genetics. 2014 May;197(1):257-71 [PMID: 24532780]
  31. Mol Biol Evol. 2013 Jan;30(1):36-44 [PMID: 22977116]

Grants

  1. P20 GM103432/NIGMS NIH HHS

MeSH Term

Algorithms
Amino Acid Sequence
Amino Acid Substitution
Animals
Evolution, Molecular
Humans
Models, Genetic
Protein Conformation
Proteins
Thermodynamics
src Homology Domains

Chemicals

Proteins

Word Cloud

Created with Highcharts 10.0.0aminoacidsubstitutionmodelbetterproteinmodelsevolutionsite-specificpseudo-energy-basedstructure-awareprobabilitiessequencemolecularstructurenewparameterizationpseudo-energycoarse-grainedforcefieldImprovementsdescriptionrequireddevelopusephylogeneticstudiesusedcharacterizeenablesimulationsequencesphylogenycharacterizationturnenablesnumerousdownstreamapplicationslikedetectingpositiveselectionancestralreconstructionevolutionarily-motivatedengineeringManyexistingMarkovdisregarddescribeprocesslongerevolutionaryperiodspoorlypresentupgradedtermsdescribeslocalheterogeneityphysicalconstraintspreviousminimumcostruntimeimportanceweighttermcharacterizingunderlyingfeaturessiteincludingcontactnumbersolventaccessibilitysecondarystructuralelementsevaluatedreturningexpectedbiologicallyreasonablerelationshipsparametersresultsacceptanceproposedsubstitutionscloselyresembleobservedfrequenciesgenefamilyalignmentsmodularfunctionmadeavailabledownloadfollowingwebsite:https://liberlescsttempleedu/Software/CASS/indexhtmlparameter-richmechanisticSH2domainmacromolecularmathematicalanalysis

Similar Articles

Cited By