The cacao Criollo genome v2.0: an improved version of the genome for genetic and functional genomic studies.

X Argout, G Martin, G Droc, O Fouet, K Labadie, E Rivals, J M Aury, C Lanaud
Author Information
  1. X Argout: CIRAD, UMR AGAP, F-34398, Montpellier, France. xavier.argout@cirad.fr. ORCID
  2. G Martin: CIRAD, UMR AGAP, F-34398, Montpellier, France.
  3. G Droc: CIRAD, UMR AGAP, F-34398, Montpellier, France.
  4. O Fouet: CIRAD, UMR AGAP, F-34398, Montpellier, France.
  5. K Labadie: Commissariat à l'Energie Atomique (CEA), Institut de Génomique (IG) Genoscope, F-92057, Evry, France.
  6. E Rivals: Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), CNRS et Université de Montpellier, 34095, Cedex 5, Montpellier, France.
  7. J M Aury: Commissariat à l'Energie Atomique (CEA), Institut de Génomique (IG) Genoscope, F-92057, Evry, France.
  8. C Lanaud: CIRAD, UMR AGAP, F-34398, Montpellier, France.

Abstract

BACKGROUND: Theobroma cacao L., native to the Amazonian basin of South America, is an economically important fruit tree crop for tropical countries as a source of chocolate. The first draft genome of the species, from a Criollo cultivar, was published in 2011. Although a useful resource, some improvements are possible, including identifying misassemblies, reducing the number of scaffolds and gaps, and anchoring un-anchored sequences to the 10 chromosomes.
METHODS: We used a NGS-based approach to significantly improve the assembly of the Belizian Criollo B97-61/B2 genome. We combined four Illumina large insert size mate paired libraries with 52x of Pacific Biosciences long reads to correct misassembled regions and reduced the number of scaffolds. We then used genotyping by sequencing (GBS) methods to increase the proportion of the assembly anchored to chromosomes.
RESULTS: The scaffold number decreased from 4,792 in assembly V1 to 554 in V2 while the scaffold N50 size has increased from 0.47 Mb in V1 to 6.5 Mb in V2. A total of 96.7% of the assembly was anchored to the 10 chromosomes compared to 66.8% in the previous version. Unknown sites (Ns) were reduced from 10.8% to 5.7%. In addition, we updated the functional annotations and performed a new RefSeq structural annotation based on RNAseq evidence.
CONCLUSION: Theobroma cacao Criollo genome version 2 will be a valuable resource for the investigation of complex traits at the genomic level and for future comparative genomics and genetics studies in cacao tree. New functional tools and annotations are available on the Cocoa Genome Hub ( http://cocoa-genome-hub.southgreen.fr ).

Keywords

References

  1. Gigascience. 2012 Dec 27;1(1):18 pubmed:23587118
  2. Bioinformatics. 2014 Dec 15;30(24):3506-14 pubmed:25165095
  3. Nat Methods. 2012 Mar 04;9(4):357-9 pubmed:22388286
  4. Bioinformatics. 2015 Dec 1;31(23):3733-41 pubmed:26261222
  5. Am J Bot. 2012 Feb;99(2):320-9 pubmed:22301895
  6. PLoS One. 2013 Aug 05;8(8):e69476 pubmed:23940520
  7. Genome Biol. 2013 Jun 03;14(6):r53 pubmed:23731509
  8. PLoS One. 2014 Feb 28;9(2):e90346 pubmed:24587335
  9. Nat Methods. 2011 Jan;8(1):61-5 pubmed:21102452
  10. G3 (Bethesda). 2015 Jun 03;5(7):1463-72 pubmed:26044731
  11. J Mol Biol. 2016 Feb 22;428(4):726-731 pubmed:26585406
  12. Bioinformatics. 2001 Sep;17(9):847-8 pubmed:11590104
  13. Nat Genet. 2011 Feb;43(2):101-8 pubmed:21186351
  14. Bioinformatics. 2011 Aug 1;27(15):2156-8 pubmed:21653522
  15. Genome Res. 2009 Jun;19(6):1068-76 pubmed:19420380
  16. Bioinformatics. 2007 Aug 15;23(16):2188-9 pubmed:17586550
  17. Nucleic Acids Res. 2009 Jan;37(Database issue):D211-5 pubmed:18940856
  18. Genet Res (Camb). 2011 Oct;93(5):343-9 pubmed:21878144
  19. Front Genet. 2014 Jul 07;5:208 pubmed:25071835
  20. BMC Genomics. 2016 Mar 16;17 :243 pubmed:26984673
  21. PLoS One. 2008 Oct 01;3(10):e3311 pubmed:18827930
  22. PLoS One. 2012;7(11):e47768 pubmed:23185243
  23. BMC Genomics. 2008 Oct 30;9:512 pubmed:18973681
  24. Nucleic Acids Res. 2017 Jan 4;45(D1):D353-D361 pubmed:27899662
  25. BMC Bioinformatics. 2009 Dec 15;10:421 pubmed:20003500
  26. Methods Mol Biol. 2007;406:89-112 pubmed:18287689
  27. Bioinformatics. 2011 Feb 15;27(4):578-9 pubmed:21149342
  28. Science. 2009 Oct 9;326(5950):236-7 pubmed:19815760
  29. Genome Res. 2011 Mar;21(3):487-93 pubmed:21209072

MeSH Term

Cacao
Chromosomes, Plant
Genome, Plant
Genomics
High-Throughput Nucleotide Sequencing
Molecular Sequence Annotation

Word Cloud