The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color.

Juan C Motamayor, Keithanne Mockaitis, Jeremy Schmutz, Niina Haiminen, Donald Livingstone, Omar Cornejo, Seth D Findley, Ping Zheng, Filippo Utro, Stefan Royaert, Christopher Saski, Jerry Jenkins, Ram Podicheti, Meixia Zhao, Brian E Scheffler, Joseph C Stack, Frank A Feltus, Guiliana M Mustiga, Freddy Amores, Wilbert Phillips, Jean Philippe Marelli, Gregory D May, Howard Shapiro, Jianxin Ma, Carlos D Bustamante, Raymond J Schnell, Dorrie Main, Don Gilbert, Laxmi Parida, David N Kuhn
Author Information

Abstract

BACKGROUND: Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders.
RESULTS: We describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina 1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation.
CONCLUSIONS: We report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits.

Associated Data

BioProject | PRJNA51633

References

  1. Nature. 2000 Dec 14;408(6814):796-815 [PMID: 11130711]
  2. Theor Appl Genet. 2004 Feb;108(3):423-32 [PMID: 14523517]
  3. Nat Rev Genet. 2007 Dec;8(12):973-82 [PMID: 17984973]
  4. Proc Natl Acad Sci U S A. 1998 Oct 27;95(22):13073-8 [PMID: 9789043]
  5. Cytogenet Genome Res. 2010 Jul;129(1-3):6-16 [PMID: 20551613]
  6. Bioinformatics. 2003 Feb 12;19(3):362-7 [PMID: 12584121]
  7. Nature. 2007 Sep 27;449(7161):463-7 [PMID: 17721507]
  8. Plant Mol Biol. 2012 Sep;80(1):117-29 [PMID: 21533841]
  9. Trends Genet. 2007 Mar;23(3):134-9 [PMID: 17275131]
  10. Plant Mol Biol. 2006 Jan;60(1):107-24 [PMID: 16463103]
  11. Heredity (Edinb). 2003 Sep;91(3):322-30 [PMID: 12939635]
  12. Nature. 2010 Jan 14;463(7278):178-83 [PMID: 20075913]
  13. Genome Res. 2004 Apr;14(4):693-9 [PMID: 15060012]
  14. Bioinformatics. 2010 Mar 1;26(5):589-95 [PMID: 20080505]
  15. Nature. 2009 Jan 29;457(7229):551-6 [PMID: 19189423]
  16. Heredity (Edinb). 2010 Apr;104(4):351-62 [PMID: 19920856]
  17. Nucleic Acids Res. 2010 Dec;38(22):e199 [PMID: 20880995]
  18. Genome Res. 2010 Sep;20(9):1297-303 [PMID: 20644199]
  19. Cytogenet Genome Res. 2005;110(1-4):462-7 [PMID: 16093699]
  20. Genet Epidemiol. 2007 Jul;31(5):365-75 [PMID: 17326099]
  21. Am J Hum Genet. 2012 Aug 10;91(2):238-51 [PMID: 22883141]
  22. BMC Bioinformatics. 2005 Feb 15;6:31 [PMID: 15713233]
  23. Proc Natl Acad Sci U S A. 2006 May 2;103(18):7175-80 [PMID: 16632598]
  24. Plant J. 2008 Mar;53(5):814-27 [PMID: 18036197]
  25. Plant Mol Biol. 1999 Nov;41(5):577-85 [PMID: 10645718]
  26. Nature. 2010 Feb 11;463(7282):763-8 [PMID: 20148030]
  27. Nat Biotechnol. 2010 Sep;28(9):951-6 [PMID: 20729833]
  28. PLoS One. 2008 Oct 01;3(10):e3311 [PMID: 18827930]
  29. Plant Cell. 2004 Nov;16(11):3084-97 [PMID: 15494558]
  30. Plant Physiol. 2009 Nov;151(3):1167-74 [PMID: 19605552]
  31. Nat Genet. 2011 Feb;43(2):101-8 [PMID: 21186351]
  32. Curr Protoc Bioinformatics. 2010 Sep;Chapter 9:Unit 9.12 [PMID: 20836076]
  33. Cytometry A. 2009 Aug;75(8):692-8 [PMID: 19565637]
  34. Proc Natl Acad Sci U S A. 2004 Sep 14;101(37):13554-9 [PMID: 15342909]
  35. Bioinformatics. 2003 Oct;19 Suppl 2:ii215-25 [PMID: 14534192]
  36. Am J Hum Genet. 2006 Apr;78(4):629-44 [PMID: 16532393]
  37. Biotech Histochem. 1999 May;74(3):160-6 [PMID: 10416789]
  38. Mol Biol Evol. 2011 Oct;28(10):2731-9 [PMID: 21546353]
  39. Curr Top Microbiol Immunol. 2008;326:17-37 [PMID: 18630745]
  40. Genome Res. 2004 May;14(5):860-9 [PMID: 15078861]
  41. Genome Res. 2009 Sep;19(9):1639-45 [PMID: 19541911]
  42. Nat Genet. 2011 May;43(5):491-8 [PMID: 21478889]
  43. Genetics. 2006 Dec;174(4):2215-28 [PMID: 17028332]
  44. Chromosome Res. 1999;7(8):641-7 [PMID: 10628665]
  45. BMC Plant Biol. 2010 Mar 21;10:50 [PMID: 20302676]
  46. Genes Dev. 2005 Sep 15;19(18):2164-75 [PMID: 16131612]
  47. J Mol Biol. 1994 Feb 4;235(5):1501-31 [PMID: 8107089]
  48. Cell. 2005 Apr 22;121(2):207-21 [PMID: 15851028]
  49. Genet Res (Camb). 2011 Oct;93(5):343-9 [PMID: 21878144]
  50. Theor Appl Genet. 2007 Feb;114(4):723-30 [PMID: 17221259]
  51. Genome Res. 2003 Sep;13(9):2178-89 [PMID: 12952885]
  52. Plant Physiol. 2006 Nov;142(3):1216-32 [PMID: 17012405]
  53. BMC Genomics. 2011 Aug 16;12:413 [PMID: 21846342]
  54. PLoS Comput Biol. 2011 Oct;7(10):e1002195 [PMID: 22039361]
  55. Genetics. 2010 Jul;185(3):727-44 [PMID: 20421607]
  56. Nucleic Acids Res. 2001 May 1;29(9):e45 [PMID: 11328886]
  57. Proc Natl Acad Sci U S A. 2009 Aug 4;106(31):12832-7 [PMID: 19622734]
  58. BMC Genomics. 2011 Jul 27;12:379 [PMID: 21794110]
  59. BMC Genet. 2013 Jun 06;14:48 [PMID: 23742238]
  60. Gene. 2009 May 1;436(1-2):1-7 [PMID: 19393167]
  61. Am J Hum Genet. 2000 Jul;67(1):170-81 [PMID: 10827107]
  62. PLoS Genet. 2006 Dec;2(12):e190 [PMID: 17194218]
  63. Bioinformatics. 1998;14(9):755-63 [PMID: 9918945]
  64. Heredity (Edinb). 2002 Nov;89(5):380-6 [PMID: 12399997]
  65. Plant Cell Rep. 2012 Feb;31(2):281-9 [PMID: 21987119]
  66. Genetics. 2008 Mar;178(3):1709-23 [PMID: 18385116]
  67. Science. 2006 Sep 15;313(5793):1596-604 [PMID: 16973872]
  68. Am J Bot. 1999 May;86(5):609-13 [PMID: 10330063]
  69. Genome Res. 2002 Apr;12(4):656-64 [PMID: 11932250]
  70. Genes Dev. 2006 Dec 15;20(24):3407-25 [PMID: 17182867]
  71. Genetics. 2009 Nov;183(3):1127-39 [PMID: 19720862]
  72. Genome Res. 2008 Dec;18(12):1924-37 [PMID: 18836034]
  73. Nucleic Acids Res. 2012 Jan;40(Database issue):D1178-86 [PMID: 22110026]
  74. Nucleic Acids Res. 2008 Jan;36(Database issue):D480-4 [PMID: 18077471]
  75. Ann Bot. 2007 Oct;100(4):875-88 [PMID: 17684025]
  76. Genet Epidemiol. 2004 Dec;27(4):348-64 [PMID: 15543638]
  77. Nucleic Acids Res. 2007 Jul;35(Web Server issue):W265-8 [PMID: 17485477]
  78. Nat Genet. 2012 Oct;44(10):1098-103 [PMID: 22922876]
  79. Genome Res. 2002 Jul;12(7):1075-9 [PMID: 12097344]
  80. Curr Opin Plant Biol. 2012 Feb;15(1):78-83 [PMID: 21968124]
  81. Bioinformatics. 2009 Aug 15;25(16):2078-9 [PMID: 19505943]
  82. Genome Res. 2003 Jan;13(1):91-6 [PMID: 12529310]
  83. Genome Biol. 2012 Aug 30;13(8):167 [PMID: 22943138]
  84. Genome Biol. 2012 Jun 15;13(6):R47 [PMID: 22704043]
  85. Bioinformatics. 2007 May 15;23(10):1289-91 [PMID: 17379693]
  86. J Mol Biol. 1990 Oct 5;215(3):403-10 [PMID: 2231712]
  87. Nat Protoc. 2007;2(9):2233-44 [PMID: 17853881]
  88. Nat Genet. 2011 Feb;43(2):109-16 [PMID: 21186353]
  89. Plant J. 1996 Oct;10(4):613-23 [PMID: 8893540]
  90. Nature. 2005 Aug 11;436(7052):793-800 [PMID: 16100779]

MeSH Term

Cacao
Chromosome Mapping
Chromosomes, Plant
Color
Fruit
Gene Expression Regulation, Plant
Genes, Plant
Genome Size
Genome, Plant
High-Throughput Nucleotide Sequencing
Quantitative Trait Loci
Quantitative Trait, Heritable
RNA, Small Interfering
Transcription Factors
Transcription, Genetic

Chemicals

RNA, Small Interfering
Transcription Factors