A chromosome-level, haplotype-phased Vanilla planifolia genome highlights the challenge of partial endoreplication for accurate whole-genome assembly.

Quentin Piet, Gaetan Droc, William Marande, Gautier Sarah, Stéphanie Bocs, Christophe Klopp, Mickael Bourge, Sonja Siljak-Yakovlev, Olivier Bouchez, Céline Lopez-Roques, Sandra Lepers-Andrzejewski, Laurent Bourgois, Joseph Zucca, Michel Dron, Pascale Besse, Michel Grisoni, Cyril Jourda, Carine Charron
Author Information
  1. Quentin Piet: CIRAD, UMR PVBMT, 97410 Saint-Pierre, La Réunion, France.
  2. Gaetan Droc: CIRAD, UMR AGAP Institut, 34398 Montpellier, France; UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398 Montpellier, France; French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, 34398 Montpellier, France. Electronic address: gaetan.droc@cirad.fr.
  3. William Marande: INRAE, CNRGV, Genotoul, 31326 Castanet-Tolosan, France.
  4. Gautier Sarah: French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, 34398 Montpellier, France; AGAP, Univ. Montpellier, CIRAD, INRAE, Montpellier SupAgro, Montpellier, France.
  5. Stéphanie Bocs: CIRAD, UMR AGAP Institut, 34398 Montpellier, France; UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398 Montpellier, France; French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, 34398 Montpellier, France.
  6. Christophe Klopp: Plateforme Bioinformatique, Genotoul, BioinfoMics, UR875 Biométrie et Intelligence Artificielle, INRAE, Castanet-Tolosan, France.
  7. Mickael Bourge: Cytometry Facility, Imagerie-Gif, Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France.
  8. Sonja Siljak-Yakovlev: Université Paris-Saclay, CNRS, AgroParisTech, Ecologie Systématique Evolution (ESE), 91190 Gif-sur-Yvette, France.
  9. Olivier Bouchez: INRAE, GeT-PlaGe, Genotoul, 31326 Castanet-Tolosan, France.
  10. Céline Lopez-Roques: INRAE, GeT-PlaGe, Genotoul, 31326 Castanet-Tolosan, France.
  11. Sandra Lepers-Andrzejewski: Etablissement Vanille de Tahiti, Uturoa, French Polynesia, France.
  12. Laurent Bourgois: Eurovanille, Rue de Maresquel, 62870 Gouy Saint André, France.
  13. Joseph Zucca: Département Biotechnologie, V. Mane Fils, 06620 Le Bar Sur Loup, France.
  14. Michel Dron: Université Paris-Saclay, CNRS, INRAE, Univ. Evry, Institute of Plant Sciences Paris-Saclay (IPS2), 91405 Orsay, France.
  15. Pascale Besse: Université de la Réunion, UMR PVBMT, Saint-Pierre, La Réunion, France.
  16. Michel Grisoni: CIRAD, UMR PVBMT, 501 Tamatave, Madagascar. Electronic address: michel.grisoni@cirad.fr.
  17. Cyril Jourda: CIRAD, UMR PVBMT, 97410 Saint-Pierre, La Réunion, France. Electronic address: cyril.jourda@cirad.fr.
  18. Carine Charron: CIRAD, UMR PVBMT, 97410 Saint-Pierre, La Réunion, France.

Abstract

Vanilla planifolia, the species cultivated to produce one of the world's most popular flavors, is highly prone to partial genome endoreplication, which leads to highly unbalanced DNA content in cells. We report here the first molecular evidence of partial endoreplication at the chromosome scale by the assembly and annotation of an accurate haplotype-phased genome of V. planifolia. Cytogenetic data demonstrated that the diploid genome size is 4.09 Gb, with 16 chromosome pairs, although aneuploid cells are frequently observed. Using PacBio HiFi and optical mapping, we assembled and phased a diploid genome of 3.4 Gb with a scaffold N50 of 1.2 Mb and 59 128 predicted protein-coding genes. The atypical k-mer frequencies and the uneven sequencing depth observed agreed with our expectation of unbalanced genome representation. Sixty-seven percent of the genes were scattered over only 30% of the genome, putatively linking gene-rich regions and the endoreplication phenomenon. By contrast, low-coverage regions (non-endoreplicated) were rich in repeated elements but also contained 33% of the annotated genes. Furthermore, this assembly showed distinct haplotype-specific sequencing depth variation patterns, suggesting complex molecular regulation of endoreplication along the chromosomes. This high-quality, anchored assembly represents 83% of the estimated V. planifolia genome. It provides a significant step toward the elucidation of this complex genome. To support post-genomics efforts, we developed the Vanilla Genome Hub, a user-friendly integrated web portal that enables centralized access to high-throughput genomic and other omics data and interoperable use of bioinformatics tools.

Keywords

References

  1. Curr Protoc Bioinformatics. 2009 Mar;Chapter 4:Unit 4.10 pubmed:19274634
  2. Genes Dev. 1996 Oct 1;10(19):2514-26 pubmed:8843202
  3. PLoS One. 2014 May 02;9(5):e91929 pubmed:24786468
  4. Genome Biol Evol. 2016 Jul 02;8(6):1996-2005 pubmed:27324917
  5. Plant Cell Rep. 2015 Sep;34(9):1477-88 pubmed:26123291
  6. Nucleic Acids Res. 2003 Nov 15;31(22):6633-9 pubmed:14602924
  7. Bioinformatics. 2018 Sep 15;34(18):3094-3100 pubmed:29750242
  8. Plant Biotechnol J. 2018 Dec;16(12):2027-2041 pubmed:29704444
  9. Bioinformatics. 2012 Dec 1;28(23):3150-2 pubmed:23060610
  10. PLoS One. 2011 Jan 31;6(1):e16526 pubmed:21304975
  11. Nat Ecol Evol. 2020 Jun;4(6):841-852 pubmed:32231327
  12. J Exp Bot. 2019 Feb 20;70(4):1069-1076 pubmed:30590678
  13. Nat Commun. 2019 Oct 10;10(1):4604 pubmed:31601818
  14. Cytometry A. 2015 Oct;87(10):958-66 pubmed:25929591
  15. Plant Cell. 2018 Oct;30(10):2330-2351 pubmed:30115738
  16. Trends Plant Sci. 2011 Nov;16(11):624-34 pubmed:21889902
  17. Bioinformatics. 2014 May 1;30(9):1236-40 pubmed:24451626
  18. Development. 2012 Oct;139(20):3817-26 pubmed:22991446
  19. Hortic Res. 2021 Sep 1;8(1):183 pubmed:34465765
  20. Chromosome Res. 2020 Jun;28(2):183-194 pubmed:32219602
  21. Genes Dev. 2009 Nov 1;23(21):2461-77 pubmed:19884253
  22. Nat Biotechnol. 2019 Aug;37(8):907-915 pubmed:31375807
  23. Genes (Basel). 2019 Jan 29;10(2): pubmed:30700014
  24. Chromosome Res. 2019 Sep;27(3):153-165 pubmed:30852707
  25. Methods Mol Biol. 2019;1962:97-120 pubmed:31020556
  26. Elife. 2018 Oct 02;7: pubmed:30277458
  27. Genome. 2008 Oct;51(10):816-26 pubmed:18923533
  28. Bioinformatics. 2015 Oct 1;31(19):3210-2 pubmed:26059717
  29. Cell. 2013 Jan 31;152(3):406-16 pubmed:23374338
  30. Genetics. 2015 Jul;200(3):771-9 pubmed:25971668
  31. Proc Natl Acad Sci U S A. 2020 Apr 28;117(17):9451-9457 pubmed:32300014
  32. Bioinformatics. 2017 Feb 15;33(4):574-576 pubmed:27797770
  33. Genome Biol Evol. 2017 Apr 1;9(4):1051-1071 pubmed:28419219
  34. Curr Opin Plant Biol. 2020 Apr;54:85-92 pubmed:32217456
  35. Nature. 2018 Nov;563(7732):501-507 pubmed:30429615
  36. New Phytol. 2019 Dec;224(4):1642-1656 pubmed:31215648
  37. Genome Biol. 2019 Nov 14;20(1):238 pubmed:31727128
  38. Cytometry A. 2003 Feb;51(2):127-8; author reply 129 pubmed:12541287
  39. Plant Biotechnol J. 2021 Oct;19(10):1967-1978 pubmed:33960617
  40. Nat Methods. 2021 Feb;18(2):170-175 pubmed:33526886
  41. Nat Genet. 2018 Feb;50(2):285-296 pubmed:29358651
  42. Bioinformatics. 2013 Apr 15;29(8):1072-5 pubmed:23422339
  43. J Mol Biol. 2016 Feb 22;428(4):726-731 pubmed:26585406
  44. Am J Bot. 2011 Jun;98(6):986-97 pubmed:21613071
  45. Nat Genet. 2015 Jan;47(1):65-72 pubmed:25420146
  46. Bioinformatics. 2011 Mar 15;27(6):764-70 pubmed:21217122
  47. Annu Rev Plant Biol. 2021 Jun 17;72:273-296 pubmed:33689401
  48. Plant J. 2021 Jul;107(2):511-524 pubmed:33960537
  49. J Plant Res. 2021 Nov;134(6):1291-1300 pubmed:34282484
  50. Gigascience. 2020 Aug 1;9(8): pubmed:32808665
  51. Bioinformatics. 2010 Mar 15;26(6):841-2 pubmed:20110278
  52. PLoS Biol. 2021 Jul 29;19(7):e3001309 pubmed:34324490
  53. Nat Rev Genet. 2007 Dec;8(12):973-82 pubmed:17984973

MeSH Term

Chromosomes
Endoreduplication
Genome Size
Haplotypes
Vanilla