GATC: a genetic algorithm for gene tree construction under the Duplication-Transfer-Loss model of evolution.

Emmanuel Noutahi, Nadia El-Mabrouk
Author Information
  1. Emmanuel Noutahi: Département d'Informatique et de Recherche Opérationnelle, Université de Montréal, Montréal, Canada. fmr.noutahi@umontreal.ca.
  2. Nadia El-Mabrouk: Département d'Informatique et de Recherche Opérationnelle, Université de Montréal, Montréal, Canada.

Abstract

BACKGROUND: Several methods have been developed for the accurate reconstruction of gene trees. Some of them use reconciliation with a species tree to correct, a posteriori, errors in gene trees inferred from multiple sequence alignments. Unfortunately the best fit to sequence information can be lost during this process.
RESULTS: We describe GATC, a new algorithm for reconstructing a binary gene tree with branch length. GATC returns optimal solutions according to a measure combining both tree likelihood (according to sequence evolution) and a reconciliation score under the Duplication-Transfer-Loss (DTL) model. It can either be used to construct a gene tree from scratch or to correct trees infered by existing reconstruction method, making it highly flexible to various input data types. The method is based on a genetic algorithm acting on a population of trees at each step. It substantially increases the efficiency of the phylogeny space exploration, reducing the risk of falling into local minima, at a reasonable computational time. We have applied GATC to a dataset of simulated cyanobacterial phylogenies, as well as to an empirical dataset of three reference gene families, and showed that it is able to improve gene tree reconstructions compared with current state-of-the-art algorithms.
CONCLUSION: The proposed algorithm is able to accurately reconstruct gene trees and is highly suitable for the construction of reference trees. Our results also highlight the efficiency of multi-objective optimization algorithms for the gene tree reconstruction problem. GATC is available on Github at: https://github.com/UdeM-LBIT/GATC .

Keywords

References

  1. Bioinformatics. 2006 Nov 1;22(21):2688-90 [PMID: 16928733]
  2. Syst Biol. 2002 Jun;51(3):492-508 [PMID: 12079646]
  3. Bioinformatics. 2003 Aug 12;19(12):1572-4 [PMID: 12912839]
  4. Syst Biol. 2013 Nov;62(6):901-12 [PMID: 23925510]
  5. Bioinformatics. 2012 Jun 15;28(12):i283-91 [PMID: 22689773]
  6. Comput Appl Biosci. 1992 Jun;8(3):275-82 [PMID: 1633570]
  7. Mol Biol Evol. 2004 Jun;21(6):1095-109 [PMID: 15014145]
  8. Syst Biol. 2014 May;63(3):409-20 [PMID: 24562812]
  9. J Comput Biol. 1997 Summer;4(2):177-87 [PMID: 9228616]
  10. Algorithms Mol Biol. 2013 Apr 08;8(1):12 [PMID: 23566548]
  11. IEEE/ACM Trans Comput Biol Bioinform. 2011 Mar-Apr;8(2):517-35 [PMID: 21233529]
  12. J Mol Evol. 2001 Oct-Nov;53(4-5):477-84 [PMID: 11675608]
  13. Pac Symp Biocomput. 1996;:512-23 [PMID: 9390255]
  14. PLoS One. 2016 Aug 11;11(8):e0159559 [PMID: 27513924]
  15. Syst Biol. 2013 Jan 1;62(1):110-20 [PMID: 22949484]
  16. Brief Bioinform. 2011 Sep;12(5):423-35 [PMID: 21737420]
  17. Nucleic Acids Res. 2004 Mar 19;32(5):1792-7 [PMID: 15034147]
  18. BMC Bioinformatics. 2009 Jun 16;10 Suppl 6:S3 [PMID: 19534752]
  19. BMC Bioinformatics. 2010 Jun 09;11:312 [PMID: 20534164]
  20. J Comput Biol. 2011 Jan;18(1):59-65 [PMID: 20715926]
  21. Genome Biol Evol. 2015 Jul 01;7(7):1988-99 [PMID: 26133389]
  22. J Comput Biol. 2000;7(3-4):429-47 [PMID: 11108472]
  23. Bioinformatics. 2016 Jul 1;32(13):2056-8 [PMID: 27153713]
  24. Syst Biol. 2003 Oct;52(5):696-704 [PMID: 14530136]
  25. Bioinformatics. 2015 Apr 15;31(8):1211-8 [PMID: 25481006]
  26. Proc Natl Acad Sci U S A. 2002 Aug 6;99(16):10516-21 [PMID: 12142465]
  27. Bioinformatics. 2005 Feb 15;21(4):456-63 [PMID: 15608047]
  28. J Mol Evol. 1984;20(1):86-93 [PMID: 6429346]
  29. Mol Biol Evol. 2011 Jan;28(1):273-90 [PMID: 20660489]
  30. J Mol Evol. 1994 Sep;39(3):306-14 [PMID: 7932792]
  31. Bioinformatics. 2009 Sep 1;25(17):2286-8 [PMID: 19535536]
  32. J Mol Evol. 1981;17(6):368-76 [PMID: 7288891]
  33. Nucleic Acids Res. 2014 Jan;42(Database issue):D922-5 [PMID: 24194607]
  34. Syst Biol. 2015 Jan;64(1):e42-62 [PMID: 25070970]
  35. Genome Res. 2013 Feb;23(2):323-30 [PMID: 23132911]
  36. Mol Biol Evol. 1998 Mar;15(3):277-83 [PMID: 9501494]
  37. IEEE/ACM Trans Comput Biol Bioinform. 2012 Sep-Oct;9(5):1515-28 [PMID: 22641711]

MeSH Term

Algorithms
Cyanobacteria
Evolution, Molecular
Gene Duplication
Genes, Bacterial
Genomics
Internet
Models, Genetic
Multigene Family
Phylogeny

Word Cloud

Created with Highcharts 10.0.0genetreetreesalgorithmGATCreconstructionsequencereconciliationcorrectcanaccordingevolutionDuplication-Transfer-LossmodelmethodhighlygeneticefficiencydatasetreferenceablealgorithmsconstructionGeneBACKGROUND:SeveralmethodsdevelopedaccurateusespeciesposteriorierrorsinferredmultiplealignmentsUnfortunatelybestfitinformationlostprocessRESULTS:describenewreconstructingbinarybranchlengthreturnsoptimalsolutionsmeasurecombininglikelihoodscoreDTLeitherusedconstructscratchinferedexistingmakingflexiblevariousinputdatatypesbasedactingpopulationstepsubstantiallyincreasesphylogenyspaceexplorationreducingriskfallinglocalminimareasonablecomputationaltimeappliedsimulatedcyanobacterialphylogenieswellempiricalthreefamiliesshowedimprovereconstructionscomparedcurrentstate-of-the-artCONCLUSION:proposedaccuratelyreconstructsuitableresultsalsohighlightmulti-objectiveoptimizationproblemavailableGithubat:https://githubcom/UdeM-LBIT/GATCGATC:duplicationGeneticHorizontaltransferPhylogenyReconciliation

Similar Articles

Cited By