A multi-objective optimization approach accurately resolves protein domain architectures.

J S Bernardes, F R J Vieira, G Zaverucha, A Carbone
Author Information
  1. J S Bernardes: Sorbonne Universités, UPMC Univ-Paris 6, CNRS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative, 15 rue de l'Ecole de Médecine, 75006 Paris.
  2. F R J Vieira: CNRS, UMR 7606, Laboratoire d'Informatique de Paris 6, 75005 Paris, France and COPPE-UFRJ, Programa de Engenharia de Sistemas e Computação, Rio de Janeiro, Brazil.
  3. G Zaverucha: COPPE-UFRJ, Programa de Engenharia de Sistemas e Computação, Rio de Janeiro, Brazil.
  4. A Carbone: Sorbonne Universités, UPMC Univ-Paris 6, CNRS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative, 15 rue de l'Ecole de Médecine, 75006 Paris, Institut Universitaire de France, 75005 Paris.

Abstract

MOTIVATION: Given a protein sequence and a number of potential domains matching it, what are the domain content and the most likely domain architecture for the sequence? This problem is of fundamental importance in protein annotation, constituting one of the main steps of all predictive annotation strategies. On the other hand, when potential domains are several and in conflict because of overlapping domain boundaries, finding a solution for the problem might become difficult. An accurate prediction of the domain architecture of a multi-domain protein provides important information for function prediction, comparative genomics and molecular evolution.
RESULTS: We developed DAMA (Domain Annotation by a Multi-objective Approach), a novel approach that identifies architectures through a multi-objective optimization algorithm combining scores of domain matches, previously observed multi-domain co-occurrence and domain overlapping. DAMA has been validated on a known benchmark dataset based on CATH structural domain assignments and on the set of Plasmodium falciparum proteins. When compared with existing tools on both datasets, it outperforms all of them.
AVAILABILITY AND IMPLEMENTATION: DAMA software is implemented in C++ and the source code can be found at http://www.lcqb.upmc.fr/DAMA.
CONTACT: juliana.silva_bernardes@upmc.fr or alessandra.carbone@lip6.fr
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

References

  1. Bioinformatics. 2009 Dec 1;25(23):3077-83 [PMID: 19786484]
  2. Nucleic Acids Res. 2009 Jan;37(Database issue):D539-43 [PMID: 18957442]
  3. BMC Evol Biol. 2008;8:285 [PMID: 18854028]
  4. Trends Biochem Sci. 2008 Sep;33(9):444-51 [PMID: 18656364]
  5. Genome Res. 2008 Mar;18(3):449-61 [PMID: 18230802]
  6. Bioinformatics. 2006 Jun 15;22(12):1418-23 [PMID: 16601004]
  7. J Mol Biol. 2005 Nov 4;353(4):911-23 [PMID: 16198373]
  8. BMC Evol Biol. 2005;5:24 [PMID: 15788102]
  9. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D399-402 [PMID: 15608225]
  10. Bioinformatics. 2004 Nov 22;20(17):3236-7 [PMID: 15044231]
  11. Science. 1999 Jul 30;285(5428):751-3 [PMID: 10427000]
  12. Bioinformatics. 1998;14(9):755-63 [PMID: 9918945]
  13. J Mol Biol. 1995 Apr 7;247(4):536-40 [PMID: 7723011]
  14. J Mol Biol. 2004 Feb 20;336(3):809-23 [PMID: 15095989]
  15. Proc Natl Acad Sci U S A. 2003 Apr 15;100(8):4516-20 [PMID: 12668763]
  16. Nucleic Acids Res. 2003 Jan 1;31(1):212-5 [PMID: 12519984]
  17. Biochim Biophys Acta. 2013 May;1834(5):898-907 [PMID: 23376183]
  18. J Mol Biol. 2001 Jul 6;310(2):311-25 [PMID: 11428892]
  19. Genome Res. 2002 Oct;12(10):1619-23 [PMID: 12368255]
  20. Nucleic Acids Res. 2013 Jan;41(Database issue):D490-8 [PMID: 23203873]
  21. Nucleic Acids Res. 2013 Jan;41(Database issue):D344-7 [PMID: 23161676]
  22. PLoS Comput Biol. 2011 Oct;7(10):e1002195 [PMID: 22039361]
  23. Nucleic Acids Res. 2010 Jan;38(Database issue):D211-22 [PMID: 19920124]
  24. Bioinformatics. 2010 Mar 15;26(6):745-51 [PMID: 20118117]
  25. BMC Bioinformatics. 2011;12:90 [PMID: 21453511]

MeSH Term

Algorithms
Genomics
Molecular Sequence Annotation
Plasmodium falciparum
Protein Structure, Tertiary
Protozoan Proteins
Sequence Analysis, Protein
Software

Chemicals

Protozoan Proteins

Word Cloud

Created with Highcharts 10.0.0domainproteinDAMApotentialdomainsarchitectureproblemannotationoverlappingpredictionmulti-domainapproacharchitecturesmulti-objectiveoptimizationMOTIVATION:Givensequencenumbermatchingcontentlikelysequence?fundamentalimportanceconstitutingonemainstepspredictivestrategieshandseveralconflictboundariesfindingsolutionmightbecomedifficultaccurateprovidesimportantinformationfunctioncomparativegenomicsmolecularevolutionRESULTS:developedDomainAnnotationMulti-objectiveApproachnovelidentifiesalgorithmcombiningscoresmatchespreviouslyobservedco-occurrencevalidatedknownbenchmarkdatasetbasedCATHstructuralassignmentssetPlasmodiumfalciparumproteinscomparedexistingtoolsdatasetsoutperformsthemAVAILABILITYANDIMPLEMENTATION:softwareimplementedC++sourcecodecanfoundhttp://wwwlcqbupmcfr/DAMACONTACT:julianasilva_bernardes@upmcfralessandracarbone@lip6frSUPPLEMENTARYINFORMATION:SupplementarydataavailableBioinformaticsonlineaccuratelyresolves

Similar Articles

Cited By (16)