Improved inference of tandem domain duplications.

Chaitanya Aluru, Mona Singh
Author Information
  1. Chaitanya Aluru: Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA.
  2. Mona Singh: Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA.

Abstract

MOTIVATION: Protein domain duplications are a major contributor to the functional diversification of protein families. These duplications can occur one at a time through single domain duplications, or as tandem duplications where several consecutive domains are duplicated together as part of a single evolutionary event. Existing methods for inferring domain-level evolutionary events are based on reconciling domain trees with gene trees. While some formulations consider multiple domain duplications, they do not explicitly model tandem duplications; this leads to inaccurate inference of which domains duplicated together over the course of evolution.
RESULTS: Here, we introduce a reconciliation-based framework that considers the relative positions of domains within extant sequences. We use this information to uncover tandem domain duplications within the evolutionary history of these genes. We devise an integer linear programming approach that solves our problem exactly, and a heuristic approach that works well in practice. We perform extensive simulation studies to demonstrate that our approaches can accurately uncover single and tandem domain duplications, and additionally test our approach on a well-studied orthogroup where lineage-specific domain expansions exhibit varying and complex domain duplication patterns.
AVAILABILITY AND IMPLEMENTATION: Code is available on github at https://github.com/Singh-Lab/TandemDuplications.
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

References

  1. Mol Phylogenet Evol. 1996 Oct;6(2):189-213 [PMID: 8899723]
  2. Proc Natl Acad Sci U S A. 2003 Oct 14;100(21):12235-40 [PMID: 14507991]
  3. Syst Biol. 2013 Jan 1;62(1):110-20 [PMID: 22949484]
  4. BMC Bioinformatics. 2015;16 Suppl 14:S8 [PMID: 26451642]
  5. Bioinformatics. 2014 May 1;30(9):1312-3 [PMID: 24451623]
  6. Science. 2003 Jun 13;300(5626):1701-3 [PMID: 12805536]
  7. IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):63-76 [PMID: 29994126]
  8. PLoS Comput Biol. 2006 Aug 25;2(8):e114 [PMID: 16933986]
  9. FASEB J. 2011 Mar;25(3):822-9 [PMID: 21115852]
  10. Protein Sci. 2018 Jan;27(1):135-145 [PMID: 28884485]
  11. J Mol Biol. 2010 Sep 10;402(1):38-51 [PMID: 20643138]
  12. J Struct Biol. 2012 Sep;179(3):289-98 [PMID: 22414427]
  13. Bioinformatics. 2008 Jul 1;24(13):i132-8 [PMID: 18586705]
  14. Nature. 1990 May 17;345(6272):273-6 [PMID: 2129545]
  15. PLoS Comput Biol. 2011 Oct;7(10):e1002195 [PMID: 22039361]
  16. Bioinformatics. 2012 Jun 15;28(12):i283-91 [PMID: 22689773]
  17. Algorithms Mol Biol. 2019 Mar 20;14:7 [PMID: 30930955]
  18. BMC Bioinformatics. 2011 Oct 05;12 Suppl 9:S2 [PMID: 22152029]
  19. Mol Biol Evol. 2012 Feb;29(2):689-705 [PMID: 21900599]

Grants

  1. R01 GM076275/NIGMS NIH HHS
  2. ABI-1458457/National Science Foundation

MeSH Term

Algorithms
Evolution, Molecular
Gene Duplication
Humans
Phylogeny
Programming, Linear
Protein Domains

Word Cloud

Created with Highcharts 10.0.0domainduplicationstandemsingledomainsevolutionaryapproachcanduplicatedtogethertreesinferencewithinuncoveravailableMOTIVATION:ProteinmajorcontributorfunctionaldiversificationproteinfamiliesoccuronetimeseveralconsecutiveparteventExistingmethodsinferringdomain-leveleventsbasedreconcilinggeneformulationsconsidermultipleexplicitlymodelleadsinaccuratecourseevolutionRESULTS:introducereconciliation-basedframeworkconsidersrelativepositionsextantsequencesuseinformationhistorygenesdeviseintegerlinearprogrammingsolvesproblemexactlyheuristicworkswellpracticeperformextensivesimulationstudiesdemonstrateapproachesaccuratelyadditionallytestwell-studiedorthogrouplineage-specificexpansionsexhibitvaryingcomplexduplicationpatternsAVAILABILITYANDIMPLEMENTATION:Codegithubhttps://githubcom/Singh-Lab/TandemDuplicationsSUPPLEMENTARYINFORMATION:SupplementarydataBioinformaticsonlineImproved

Similar Articles

Cited By