Integrated synteny- and similarity-based inference on the polyploidization-fractionation cycle.

Yue Zhang, Zhe Yu, Chunfang Zheng, David Sankoff
Author Information
  1. Yue Zhang: Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada K1N 6N5.
  2. Zhe Yu: Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada K1N 6N5.
  3. Chunfang Zheng: Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada K1N 6N5.
  4. David Sankoff: Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada K1N 6N5. ORCID

Abstract

Whole-genome doubling, tripling or replicating to a greater degree, due to fixation of polyploidization events, is attested in almost all lineages of the flowering plants, recurring in the ancestry of some plants two, three or more times in retracing their history to the earliest angiosperm. This major mechanism in plant genome evolution, which generally appears as instantaneous on the evolutionary time scale, sets in operation a compensatory process called fractionation, the loss of duplicate genes, initially rapid, but continuing at a diminishing rate over millions and tens of millions of years. We study this process by statistically comparing the distribution of duplicate gene pairs as a function of their time of creation through polyploidization, as measured by sequence similarity. The stochastic model that accounts for this distribution, though exceedingly simple, still has too many parameters to be estimated based only on the similarity distribution, while the computational procedures for compiling the distribution from annotated genomic data is heavily biased against earlier polyploidization events-syntenic 'crumble'. Other parameters, such as the size of the initial gene complement and the ploidy of the various events giving rise to duplicate gene pairs, are even more inaccessible to estimation. Here, we show how the frequency of genes, identified via their embedding in stretches of duplicate pairs, together with previously established constraints among some parameters, adds enormously to the range of successive polyploidization events that can be analysed. This also allows us to estimate the initial gene complement and to correct for the bias due to crumble. We explore the applicability of our methodology to four flowering plant genomes covering a range of different polyploidization histories.

Keywords

References

  1. Nat Commun. 2019 Oct 16;10(1):4702 [PMID: 31619678]
  2. Plant J. 2008 Feb;53(4):661-73 [PMID: 18269575]
  3. Algorithms Mol Biol. 2019 Aug 1;14:18 [PMID: 31388348]
  4. Mol Biol Evol. 2007 Nov;24(11):2485-94 [PMID: 17768305]
  5. BMC Bioinformatics. 2016 Nov 11;17(Suppl 14):412 [PMID: 28185566]
  6. Plant Physiol. 2008 Dec;148(4):1772-81 [PMID: 18952863]
  7. J Bioinform Comput Biol. 2009 Jun;7(3):499-520 [PMID: 19507287]
  8. Genetics. 1997 Nov;147(3):1259-66 [PMID: 9383068]
  9. IEEE/ACM Trans Comput Biol Bioinform. 2019 Dec 19;PP: [PMID: 31869797]
  10. Gigascience. 2018 Jul 1;7(7): [PMID: 29931210]
  11. BMC Bioinformatics. 2015;16 Suppl 17:S9 [PMID: 26680009]
  12. BMC Genomics. 2013;14 Suppl 7:S3 [PMID: 24564362]
  13. Front Genet. 2020 Dec 18;11:603056 [PMID: 33391353]
  14. Nat Genet. 2017 Nov;49(11):1633-1641 [PMID: 28991254]
  15. IEEE/ACM Trans Comput Biol Bioinform. 2018 Jul 31;: [PMID: 30072336]
  16. BMC Bioinformatics. 2019 Dec 17;20(Suppl 20):635 [PMID: 31842736]
  17. Science. 2006 Sep 15;313(5793):1596-604 [PMID: 16973872]
  18. Plant Physiol. 2019 Jan;179(1):209-219 [PMID: 30385647]
  19. Ann Bot. 2017 Aug 1;120(2):195-207 [PMID: 28854566]
  20. Mol Biol Evol. 2006 Jun;23(6):1136-43 [PMID: 16527865]
  21. BMC Genomics. 2018 May 8;19(Suppl 5):287 [PMID: 29745846]

Word Cloud

Created with Highcharts 10.0.0polyploidizationduplicatedistributiongeneeventsfloweringplantsprocesspairsparametersdueplantevolutiontimefractionationgenesmillionssimilarityinitialcomplementrangeWhole-genomedoublingtriplingreplicatinggreaterdegreefixationattestedalmostlineagesrecurringancestrytwothreetimesretracinghistoryearliestangiospermmajormechanismgenomegenerallyappearsinstantaneousevolutionaryscalesetsoperationcompensatorycalledlossinitiallyrapidcontinuingdiminishingratetensyearsstudystatisticallycomparingfunctioncreationmeasuredsequencestochasticmodelaccountsthoughexceedinglysimplestillmanyestimatedbasedcomputationalprocedurescompilingannotatedgenomicdataheavilybiasedearlierevents-syntenic'crumble'sizeploidyvariousgivingriseeveninaccessibleestimationshowfrequencyidentifiedviaembeddingstretchestogetherpreviouslyestablishedconstraintsamongaddsenormouslysuccessivecananalysedalsoallowsusestimatecorrectbiascrumbleexploreapplicabilitymethodologyfourgenomescoveringdifferenthistoriesIntegratedsynteny-similarity-basedinferencepolyploidization-fractionationcyclebranchingcomparativegenomicswhole-genomeduplication

Similar Articles

Cited By