Learning Retention Mechanisms and Evolutionary Parameters of Duplicate Genes from Their Expression Data.

Michael DeGiorgio, Raquel Assis
Author Information
  1. Michael DeGiorgio: Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431.
  2. Raquel Assis: Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431.

Abstract

Learning about the roles that duplicate genes play in the origins of novel phenotypes requires an understanding of how their functions evolve. A previous method for achieving this goal, CDROM, employs gene expression distances as proxies for functional divergence and then classifies the evolutionary mechanisms retaining duplicate genes from comparisons of these distances in a decision tree framework. However, CDROM does not account for stochastic shifts in gene expression or leverage advances in contemporary statistical learning for performing classification, nor is it capable of predicting the parameters driving duplicate gene evolution. Thus, here we develop CLOUD, a multi-layer neural network built on a model of gene expression evolution that can both classify duplicate gene retention mechanisms and predict their underlying evolutionary parameters. We show that not only is the CLOUD classifier substantially more powerful and accurate than CDROM, but that it also yields accurate parameter predictions, enabling a better understanding of the specific forces driving the evolution and long-term retention of duplicate genes. Further, application of the CLOUD classifier and predictor to empirical data from Drosophila recapitulates many previous findings about gene duplication in this lineage, showing that new functions often emerge rapidly and asymmetrically in younger duplicate gene copies, and that functional divergence is driven by strong natural selection. Hence, CLOUD represents a major advancement in classifying retention mechanisms and predicting evolutionary parameters of duplicate genes, thereby highlighting the utility of incorporating sophisticated statistical learning techniques to address long-standing questions about evolution after gene duplication.

Keywords

References

  1. PLoS Genet. 2018 Apr 23;14(4):e1007341 [PMID: 29684059]
  2. Proc Natl Acad Sci U S A. 2008 Jul 8;105(27):9272-7 [PMID: 18583475]
  3. Nature. 2011 Oct 19;478(7369):343-8 [PMID: 22012392]
  4. Science. 2010 Dec 17;330(6011):1682-5 [PMID: 21164016]
  5. Mol Biol Evol. 2019 Feb 1;36(2):220-238 [PMID: 30517664]
  6. Evolution. 2017 Dec;71(12):2871-2884 [PMID: 28875541]
  7. Mol Biol Evol. 2013 Feb;30(2):263-71 [PMID: 23051842]
  8. Plant Cell. 2020 May;32(5):1344-1345 [PMID: 32184348]
  9. Genome Res. 2016 Jan;26(1):60-9 [PMID: 26518480]
  10. Nature. 2010 Dec 9;468(7325):811-4 [PMID: 21150996]
  11. EMBO Rep. 2002 Apr;3(4):378-83 [PMID: 11943764]
  12. Genome Res. 2016 Jun;26(6):787-98 [PMID: 27197209]
  13. PLoS Genet. 2020 Aug 27;16(8):e1008896 [PMID: 32853200]
  14. Nat Biotechnol. 2013 Jan;31(1):46-53 [PMID: 23222703]
  15. Genome Biol. 2009;10(3):R25 [PMID: 19261174]
  16. Nat Commun. 2018 Feb 19;9(1):703 [PMID: 29459739]
  17. Genetics. 2013 Aug;194(4):937-54 [PMID: 23733788]
  18. BMC Evol Biol. 2016 Feb 20;16:45 [PMID: 26897341]
  19. Genetics. 2003 Jul;164(3):977-88 [PMID: 12871908]
  20. Mol Biol Evol. 2020 Jun 1;37(6):1790-1808 [PMID: 32077950]
  21. Genome Biol Evol. 2014 Sep 17;6(10):2822-9 [PMID: 25237051]
  22. BMC Evol Biol. 2015 Jul 15;15:138 [PMID: 26173681]
  23. PLoS Genet. 2016 Mar 15;12(3):e1005928 [PMID: 26977894]
  24. PLoS Comput Biol. 2016 Mar 28;12(3):e1004845 [PMID: 27018908]
  25. Proc Natl Acad Sci U S A. 2013 Oct 22;110(43):17409-14 [PMID: 24101476]
  26. Proc Biol Sci. 2012 Dec 22;279(1749):5048-57 [PMID: 22977152]
  27. PLoS Genet. 2017 Apr 13;13(4):e1006402 [PMID: 28406900]
  28. Proc Natl Acad Sci U S A. 2002 Oct 1;99(20):12783-8 [PMID: 12196633]
  29. Trends Genet. 2008 Aug;24(8):390-7 [PMID: 18585818]
  30. J Mol Evol. 1999 Aug;49(2):169-81 [PMID: 10441669]
  31. Am Nat. 2004 Dec;164(6):683-695 [PMID: 29641928]
  32. Nat Genet. 2002 Oct;32(2):261-6 [PMID: 12219088]
  33. Nature. 2004 Apr 8;428(6983):617-24 [PMID: 15004568]
  34. Am J Hum Genet. 1973 Sep;25(5):471-92 [PMID: 4741844]
  35. J Mol Evol. 2007 Nov;65(5):574-88 [PMID: 17957399]
  36. PLoS Biol. 2004 May;2(5):E132 [PMID: 15138501]
  37. Mol Biol Evol. 2004 Jul;21(7):1308-17 [PMID: 15034135]
  38. Mol Biol Evol. 2008 Aug;25(8):1631-8 [PMID: 18480070]
  39. Genetics. 1999 Apr;151(4):1531-45 [PMID: 10101175]
  40. Mol Biol Evol. 2007 Mar;24(3):679-86 [PMID: 17179139]
  41. Genome Biol. 2002;3(2):RESEARCH0008 [PMID: 11864370]
  42. Nature. 2009 Jun 18;459(7249):927-30 [PMID: 19536255]
  43. Chromosome Res. 2009;17(5):699-717 [PMID: 19802709]
  44. Genome Biol Evol. 2019 Jan 1;11(1):207-219 [PMID: 30398650]
  45. Nature. 2007 Nov 8;450(7167):203-18 [PMID: 17994087]
  46. PLoS Biol. 2005 Nov;3(11):e357 [PMID: 16201836]
  47. Genome Res. 2010 Nov;20(11):1526-33 [PMID: 20798392]
  48. Mol Biol Evol. 2018 Oct 1;35(10):2582-2584 [PMID: 30165589]
  49. Evolution. 1997 Oct;51(5):1341-1351 [PMID: 28568616]
  50. Genetics. 2011 Jan;187(1):229-44 [PMID: 21041556]
  51. Nat Genet. 2001 Dec;29(4):482-6 [PMID: 11694880]
  52. BMC Evol Biol. 2019 May 2;19(1):97 [PMID: 31046675]
  53. Genetics. 2004 Sep;168(1):373-81 [PMID: 15454550]
  54. Curr Biol. 2011 Feb 22;21(4):306-10 [PMID: 21295484]
  55. Genome Res. 2012 Apr;22(4):602-10 [PMID: 22207615]
  56. Genetics. 2005 Feb;169(2):1157-64 [PMID: 15654095]
  57. Proc Natl Acad Sci U S A. 2018 Jul 10;115(28):7386-7391 [PMID: 29941601]
  58. Evolution. 2009 Apr;63(4):1090-100 [PMID: 19154380]
  59. G3 (Bethesda). 2018 May 31;8(6):1959-1970 [PMID: 29626082]
  60. Genome Biol Evol. 2011;3:1197-209 [PMID: 21920903]
  61. Genome Biol. 2007;8(10):R209 [PMID: 17916239]
  62. BMC Evol Biol. 2005 Apr 14;5:28 [PMID: 15831095]
  63. Mol Biol Evol. 2005 May;22(5):1345-54 [PMID: 15746013]
  64. Fly (Austin). 2014;8(2):91-4 [PMID: 25483247]
  65. Mol Biol Evol. 2017 Dec 1;34(12):3089-3098 [PMID: 28961791]
  66. Bioinformatics. 2005 Jun 1;21(11):2730-8 [PMID: 15797912]
  67. PLoS Genet. 2017 May 22;13(5):e1006795 [PMID: 28531189]
  68. BMC Evol Biol. 2016 Apr 14;16:82 [PMID: 27080514]
  69. Syst Biol. 2015 Sep;64(5):695-708 [PMID: 26169525]
  70. Nat Methods. 2013 Jan;10(1):71-3 [PMID: 23160280]
  71. Evolution. 2019 Sep;73(9):1850-1862 [PMID: 31418820]
  72. Mol Biol Evol. 2019 Feb 1;36(2):252-270 [PMID: 30398642]
  73. Mol Biol Evol. 2014 Jan;31(1):201-11 [PMID: 24113538]
  74. Genome Biol. 2007;8(5):213 [PMID: 17521457]
  75. Genome Res. 2002 Dec;12(12):1854-9 [PMID: 12466289]
  76. Genome Res. 2010 Oct;20(10):1313-26 [PMID: 20651121]
  77. Syst Biol. 2010 May;59(3):307-21 [PMID: 20525638]
  78. PLoS Comput Biol. 2011 Jan 06;7(1):e1001049 [PMID: 21253556]
  79. Mol Biol Evol. 2014 Feb;31(2):419-24 [PMID: 24202613]
  80. J Mol Biol. 1990 Oct 5;215(3):403-10 [PMID: 2231712]

Grants

  1. R35 GM128590/NIGMS NIH HHS
  2. R35 GM142438/NIGMS NIH HHS

MeSH Term

Animals
Drosophila
Evolution, Molecular
Gene Duplication
Gene Expression
Models, Genetic
Neural Networks, Computer
Software

Word Cloud

Created with Highcharts 10.0.0geneduplicategenesevolutionCLOUDCDROMexpressionevolutionarymechanismsparametersretentionduplicationLearningunderstandingfunctionspreviousdistancesfunctionaldivergencestatisticallearningpredictingdrivingneuralnetworkclassifieraccuraterolesplayoriginsnovelphenotypesrequiresevolvemethodachievinggoalemploysproxiesclassifiesretainingcomparisonsdecisiontreeframeworkHoweveraccountstochasticshiftsleverageadvancescontemporaryperformingclassificationcapableThusdevelopmulti-layerbuiltmodelcanclassifypredictunderlyingshowsubstantiallypowerfulalsoyieldsparameterpredictionsenablingbetterspecificforceslong-termapplicationpredictorempiricaldataDrosophilarecapitulatesmanyfindingslineageshowingnewoftenemergerapidlyasymmetricallyyoungercopiesdrivenstrongnaturalselectionHencerepresentsmajoradvancementclassifyingtherebyhighlightingutilityincorporatingsophisticatedtechniquesaddresslong-standingquestionsRetentionMechanismsEvolutionaryParametersDuplicateGenesExpressionDataOrnstein–Uhlenbeckneofunctionalizationsubfunctionalization

Similar Articles

Cited By