Bayesian nonparametrics in protein remote homology search.

Mindaugas Margelevičius
Author Information
  1. Mindaugas Margelevičius: Institute of Biotechnology, Vilnius University, Vilnius 10257, Lithuania.

Abstract

MOTIVATION: Wide application of modeling of three-dimensional protein structures in biomedical research motivates developing protein sequence alignment computer tools featuring high alignment accuracy and sensitivity to remotely homologous proteins. In this paper, we aim at improving the quality of alignments between sequence profiles, encoded multiple sequence alignments. Modeling profile contexts, fixed-length profile fragments, is engaged to achieve this goal.
RESULTS: We develop a hierarchical Dirichlet process mixture model to describe the distribution of profile contexts, which is able to capture dependencies between amino acids in each context position. The model represents an attempt at modeling profile fragments at several hierarchical levels, within the profile and among profiles. Even modeling unit-length contexts leads to greater improvements than processing 13-length contexts previously. We develop a new profile comparison method, called COMER, integrating the model. A benchmark with three other profile-to-profile comparison methods shows an increase in both sensitivity and alignment quality.
AVAILABILITY AND IMPLEMENTATION: COMER is open-source software licensed under the GNU GPLv3, available at https://sourceforge.net/projects/comer
CONTACT: mindaugas.margelevicius@bti.vu.lt
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

MeSH Term

Algorithms
Amino Acid Sequence
Bayes Theorem
Models, Molecular
Proteins
Sequence Alignment
Sequence Analysis, Protein
Sequence Homology, Amino Acid
Software

Chemicals

Proteins

Word Cloud

Created with Highcharts 10.0.0profilecontextsmodelingproteinsequencealignmentmodelsensitivityqualityalignmentsprofilesfragmentsdevelophierarchicalcomparisonCOMERavailableMOTIVATION:Wideapplicationthree-dimensionalstructuresbiomedicalresearchmotivatesdevelopingcomputertoolsfeaturinghighaccuracyremotelyhomologousproteinspaperaimimprovingencodedmultipleModelingfixed-lengthengagedachievegoalRESULTS:DirichletprocessmixturedescribedistributionablecapturedependenciesaminoacidscontextpositionrepresentsattemptseverallevelswithinamongEvenunit-lengthleadsgreaterimprovementsprocessing13-lengthpreviouslynewmethodcalledintegratingbenchmarkthreeprofile-to-profilemethodsshowsincreaseAVAILABILITYANDIMPLEMENTATION:open-sourcesoftwarelicensedGNUGPLv3https://sourceforgenet/projects/comerCONTACT:mindaugasmargelevicius@btivultSUPPLEMENTARYINFORMATION:SupplementarydataBioinformaticsonlineBayesiannonparametricsremotehomologysearch

Similar Articles

Cited By