Support vector training of protein alignment models.

Chun-Nam John Yu, Thorsten Joachims, Ron Elber, Jaroslaw Pillardy
Author Information
  1. Chun-Nam John Yu: Department of Computer Science, Cornell University, Ithaca, New York, USA. cnyu@cs.cornell.edu

Abstract

Sequence to structure alignment is an important step in homology modeling of protein structures. Incorporation of features such as secondary structure, solvent accessibility, or evolutionary information improve sequence to structure alignment accuracy, but conventional generative estimation techniques for alignment models impose independence assumptions that make these features difficult to include in a principled way. In this paper, we overcome this problem using a Support Vector Machine (SVM) method that provides a well-founded way of estimating complex alignment models with hundred of thousands of parameters. Furthermore, we show that the method can be trained using a variety of loss functions. In a rigorous empirical evaluation, the SVM algorithm outperforms the generative alignment method SSALN, a highly accurate generative alignment model that incorporates structural information. The alignment model learned by the SVM aligns 50% of the residues correctly and aligns over 70% of the residues within a shift of four positions.

References

  1. Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915-9 [PMID: 1438297]
  2. Methods Enzymol. 1996;266:481-94 [PMID: 8743701]
  3. Proc Int Conf Intell Syst Mol Biol. 1996;4:44-51 [PMID: 8877503]
  4. Proteins. 2004 Sep 1;56(4):753-67 [PMID: 15281128]
  5. Protein Eng. 1998 Sep;11(9):739-47 [PMID: 9796821]
  6. Biopolymers. 1983 Dec;22(12):2577-637 [PMID: 6667333]
  7. J Mol Biol. 1981 Mar 25;147(1):195-7 [PMID: 7265238]
  8. Nucleic Acids Res. 2005 Apr 22;33(7):2302-9 [PMID: 15849316]
  9. Proc Natl Acad Sci U S A. 2004 Nov 16;101(46):16138-43 [PMID: 15534223]
  10. Proteins. 2006 Mar 1;62(4):881-91 [PMID: 16385554]

Grants

  1. R01 GM067823-06/NIGMS NIH HHS
  2. GM67823/NIGMS NIH HHS
  3. R01 GM059796/NIGMS NIH HHS
  4. IS10RR020889/NCRR NIH HHS
  5. R01 GM067823/NIGMS NIH HHS

MeSH Term

Algorithms
Amino Acid Sequence
Artificial Intelligence
Computational Biology
Databases, Protein
Pattern Recognition, Automated
Protein Conformation
Proteins
Sequence Alignment
Sequence Analysis, Protein

Chemicals

Proteins

Word Cloud

Created with Highcharts 10.0.0alignmentstructuregenerativemodelsSVMmethodproteinfeaturesinformationwayusingSupportmodelalignsresiduesSequenceimportantstephomologymodelingstructuresIncorporationsecondarysolventaccessibilityevolutionaryimprovesequenceaccuracyconventionalestimationtechniquesimposeindependenceassumptionsmakedifficultincludeprincipledpaperovercomeproblemVectorMachineprovideswell-foundedestimatingcomplexhundredthousandsparametersFurthermoreshowcantrainedvarietylossfunctionsrigorousempiricalevaluationalgorithmoutperformsSSALNhighlyaccurateincorporatesstructurallearned50%correctly70%withinshiftfourpositionsvectortraining

Similar Articles

Cited By