The cacao Criollo genome v2.0: an improved version of the genome for genetic and functional genomic studies.

X Argout, G Martin, G Droc, O Fouet, K Labadie, E Rivals, J M Aury, C Lanaud
Author Information
  1. X Argout: CIRAD, UMR AGAP, F-34398, Montpellier, France. xavier.argout@cirad.fr. ORCID
  2. G Martin: CIRAD, UMR AGAP, F-34398, Montpellier, France.
  3. G Droc: CIRAD, UMR AGAP, F-34398, Montpellier, France.
  4. O Fouet: CIRAD, UMR AGAP, F-34398, Montpellier, France.
  5. K Labadie: Commissariat à l'Energie Atomique (CEA), Institut de Génomique (IG) Genoscope, F-92057, Evry, France.
  6. E Rivals: Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), CNRS et Université de Montpellier, 34095, Cedex 5, Montpellier, France.
  7. J M Aury: Commissariat à l'Energie Atomique (CEA), Institut de Génomique (IG) Genoscope, F-92057, Evry, France.
  8. C Lanaud: CIRAD, UMR AGAP, F-34398, Montpellier, France.

Abstract

BACKGROUND: Theobroma cacao L., native to the Amazonian basin of South America, is an economically important fruit tree crop for tropical countries as a source of chocolate. The first draft genome of the species, from a Criollo cultivar, was published in 2011. Although a useful resource, some improvements are possible, including identifying misassemblies, reducing the number of scaffolds and gaps, and anchoring un-anchored sequences to the 10 chromosomes.
METHODS: We used a NGS-based approach to significantly improve the assembly of the Belizian Criollo B97-61/B2 genome. We combined four Illumina large insert size mate paired libraries with 52x of Pacific Biosciences long reads to correct misassembled regions and reduced the number of scaffolds. We then used genotyping by sequencing (GBS) methods to increase the proportion of the assembly anchored to chromosomes.
RESULTS: The scaffold number decreased from 4,792 in assembly V1 to 554 in V2 while the scaffold N50 size has increased from 0.47 Mb in V1 to 6.5 Mb in V2. A total of 96.7% of the assembly was anchored to the 10 chromosomes compared to 66.8% in the previous version. Unknown sites (Ns) were reduced from 10.8% to 5.7%. In addition, we updated the functional annotations and performed a new RefSeq structural annotation based on RNAseq evidence.
CONCLUSION: Theobroma cacao Criollo genome version 2 will be a valuable resource for the investigation of complex traits at the genomic level and for future comparative genomics and genetics studies in cacao tree. New functional tools and annotations are available on the Cocoa Genome Hub ( http://cocoa-genome-hub.southgreen.fr ).

Keywords

References

  1. Gigascience. 2012 Dec 27;1(1):18 [PMID: 23587118]
  2. Bioinformatics. 2014 Dec 15;30(24):3506-14 [PMID: 25165095]
  3. Nat Methods. 2012 Mar 04;9(4):357-9 [PMID: 22388286]
  4. Bioinformatics. 2015 Dec 1;31(23):3733-41 [PMID: 26261222]
  5. Am J Bot. 2012 Feb;99(2):320-9 [PMID: 22301895]
  6. PLoS One. 2013 Aug 05;8(8):e69476 [PMID: 23940520]
  7. Genome Biol. 2013 Jun 03;14(6):r53 [PMID: 23731509]
  8. PLoS One. 2014 Feb 28;9(2):e90346 [PMID: 24587335]
  9. Nat Methods. 2011 Jan;8(1):61-5 [PMID: 21102452]
  10. G3 (Bethesda). 2015 Jun 03;5(7):1463-72 [PMID: 26044731]
  11. J Mol Biol. 2016 Feb 22;428(4):726-731 [PMID: 26585406]
  12. Bioinformatics. 2001 Sep;17(9):847-8 [PMID: 11590104]
  13. Nat Genet. 2011 Feb;43(2):101-8 [PMID: 21186351]
  14. Bioinformatics. 2011 Aug 1;27(15):2156-8 [PMID: 21653522]
  15. Genome Res. 2009 Jun;19(6):1068-76 [PMID: 19420380]
  16. Bioinformatics. 2007 Aug 15;23(16):2188-9 [PMID: 17586550]
  17. Nucleic Acids Res. 2009 Jan;37(Database issue):D211-5 [PMID: 18940856]
  18. Genet Res (Camb). 2011 Oct;93(5):343-9 [PMID: 21878144]
  19. Front Genet. 2014 Jul 07;5:208 [PMID: 25071835]
  20. BMC Genomics. 2016 Mar 16;17 :243 [PMID: 26984673]
  21. PLoS One. 2008 Oct 01;3(10):e3311 [PMID: 18827930]
  22. PLoS One. 2012;7(11):e47768 [PMID: 23185243]
  23. BMC Genomics. 2008 Oct 30;9:512 [PMID: 18973681]
  24. Nucleic Acids Res. 2017 Jan 4;45(D1):D353-D361 [PMID: 27899662]
  25. BMC Bioinformatics. 2009 Dec 15;10:421 [PMID: 20003500]
  26. Methods Mol Biol. 2007;406:89-112 [PMID: 18287689]
  27. Bioinformatics. 2011 Feb 15;27(4):578-9 [PMID: 21149342]
  28. Science. 2009 Oct 9;326(5950):236-7 [PMID: 19815760]
  29. Genome Res. 2011 Mar;21(3):487-93 [PMID: 21209072]

MeSH Term

Cacao
Chromosomes, Plant
Genome, Plant
Genomics
High-Throughput Nucleotide Sequencing
Molecular Sequence Annotation

Word Cloud

Similar Articles

Cited By