The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color.

Juan C Motamayor, Keithanne Mockaitis, Jeremy Schmutz, Niina Haiminen, Donald Livingstone, Omar Cornejo, Seth D Findley, Ping Zheng, Filippo Utro, Stefan Royaert, Christopher Saski, Jerry Jenkins, Ram Podicheti, Meixia Zhao, Brian E Scheffler, Joseph C Stack, Frank A Feltus, Guiliana M Mustiga, Freddy Amores, Wilbert Phillips, Jean Philippe Marelli, Gregory D May, Howard Shapiro, Jianxin Ma, Carlos D Bustamante, Raymond J Schnell, Dorrie Main, Don Gilbert, Laxmi Parida, David N Kuhn
Author Information

Abstract

BACKGROUND: Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders.
RESULTS: We describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina 1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation.
CONCLUSIONS: We report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits.

Associated Data

BioProject | PRJNA51633

References

  1. Nature. 2000 Dec 14;408(6814):796-815 pubmed:11130711
  2. Theor Appl Genet. 2004 Feb;108(3):423-32 pubmed:14523517
  3. Nat Rev Genet. 2007 Dec;8(12):973-82 pubmed:17984973
  4. Proc Natl Acad Sci U S A. 1998 Oct 27;95(22):13073-8 pubmed:9789043
  5. Cytogenet Genome Res. 2010 Jul;129(1-3):6-16 pubmed:20551613
  6. Bioinformatics. 2003 Feb 12;19(3):362-7 pubmed:12584121
  7. Nature. 2007 Sep 27;449(7161):463-7 pubmed:17721507
  8. Plant Mol Biol. 2012 Sep;80(1):117-29 pubmed:21533841
  9. Trends Genet. 2007 Mar;23(3):134-9 pubmed:17275131
  10. Plant Mol Biol. 2006 Jan;60(1):107-24 pubmed:16463103
  11. Heredity (Edinb). 2003 Sep;91(3):322-30 pubmed:12939635
  12. Nature. 2010 Jan 14;463(7278):178-83 pubmed:20075913
  13. Genome Res. 2004 Apr;14(4):693-9 pubmed:15060012
  14. Bioinformatics. 2010 Mar 1;26(5):589-95 pubmed:20080505
  15. Nature. 2009 Jan 29;457(7229):551-6 pubmed:19189423
  16. Heredity (Edinb). 2010 Apr;104(4):351-62 pubmed:19920856
  17. Nucleic Acids Res. 2010 Dec;38(22):e199 pubmed:20880995
  18. Genome Res. 2010 Sep;20(9):1297-303 pubmed:20644199
  19. Cytogenet Genome Res. 2005;110(1-4):462-7 pubmed:16093699
  20. Genet Epidemiol. 2007 Jul;31(5):365-75 pubmed:17326099
  21. Am J Hum Genet. 2012 Aug 10;91(2):238-51 pubmed:22883141
  22. BMC Bioinformatics. 2005 Feb 15;6:31 pubmed:15713233
  23. Proc Natl Acad Sci U S A. 2006 May 2;103(18):7175-80 pubmed:16632598
  24. Plant J. 2008 Mar;53(5):814-27 pubmed:18036197
  25. Plant Mol Biol. 1999 Nov;41(5):577-85 pubmed:10645718
  26. Nature. 2010 Feb 11;463(7282):763-8 pubmed:20148030
  27. Nat Biotechnol. 2010 Sep;28(9):951-6 pubmed:20729833
  28. PLoS One. 2008 Oct 01;3(10):e3311 pubmed:18827930
  29. Plant Cell. 2004 Nov;16(11):3084-97 pubmed:15494558
  30. Plant Physiol. 2009 Nov;151(3):1167-74 pubmed:19605552
  31. Nat Genet. 2011 Feb;43(2):101-8 pubmed:21186351
  32. Curr Protoc Bioinformatics. 2010 Sep;Chapter 9:Unit 9.12 pubmed:20836076
  33. Cytometry A. 2009 Aug;75(8):692-8 pubmed:19565637
  34. Proc Natl Acad Sci U S A. 2004 Sep 14;101(37):13554-9 pubmed:15342909
  35. Bioinformatics. 2003 Oct;19 Suppl 2:ii215-25 pubmed:14534192
  36. Am J Hum Genet. 2006 Apr;78(4):629-44 pubmed:16532393
  37. Biotech Histochem. 1999 May;74(3):160-6 pubmed:10416789
  38. Mol Biol Evol. 2011 Oct;28(10):2731-9 pubmed:21546353
  39. Curr Top Microbiol Immunol. 2008;326:17-37 pubmed:18630745
  40. Genome Res. 2004 May;14(5):860-9 pubmed:15078861
  41. Genome Res. 2009 Sep;19(9):1639-45 pubmed:19541911
  42. Nat Genet. 2011 May;43(5):491-8 pubmed:21478889
  43. Genetics. 2006 Dec;174(4):2215-28 pubmed:17028332
  44. Chromosome Res. 1999;7(8):641-7 pubmed:10628665
  45. BMC Plant Biol. 2010 Mar 21;10:50 pubmed:20302676
  46. Genes Dev. 2005 Sep 15;19(18):2164-75 pubmed:16131612
  47. J Mol Biol. 1994 Feb 4;235(5):1501-31 pubmed:8107089
  48. Cell. 2005 Apr 22;121(2):207-21 pubmed:15851028
  49. Genet Res (Camb). 2011 Oct;93(5):343-9 pubmed:21878144
  50. Theor Appl Genet. 2007 Feb;114(4):723-30 pubmed:17221259
  51. Genome Res. 2003 Sep;13(9):2178-89 pubmed:12952885
  52. Plant Physiol. 2006 Nov;142(3):1216-32 pubmed:17012405
  53. BMC Genomics. 2011 Aug 16;12:413 pubmed:21846342
  54. PLoS Comput Biol. 2011 Oct;7(10):e1002195 pubmed:22039361
  55. Genetics. 2010 Jul;185(3):727-44 pubmed:20421607
  56. Nucleic Acids Res. 2001 May 1;29(9):e45 pubmed:11328886
  57. Proc Natl Acad Sci U S A. 2009 Aug 4;106(31):12832-7 pubmed:19622734
  58. BMC Genomics. 2011 Jul 27;12:379 pubmed:21794110
  59. BMC Genet. 2013 Jun 06;14:48 pubmed:23742238
  60. Gene. 2009 May 1;436(1-2):1-7 pubmed:19393167
  61. Am J Hum Genet. 2000 Jul;67(1):170-81 pubmed:10827107
  62. PLoS Genet. 2006 Dec;2(12):e190 pubmed:17194218
  63. Bioinformatics. 1998;14(9):755-63 pubmed:9918945
  64. Heredity (Edinb). 2002 Nov;89(5):380-6 pubmed:12399997
  65. Plant Cell Rep. 2012 Feb;31(2):281-9 pubmed:21987119
  66. Genetics. 2008 Mar;178(3):1709-23 pubmed:18385116
  67. Science. 2006 Sep 15;313(5793):1596-604 pubmed:16973872
  68. Am J Bot. 1999 May;86(5):609-13 pubmed:10330063
  69. Genome Res. 2002 Apr;12(4):656-64 pubmed:11932250
  70. Genes Dev. 2006 Dec 15;20(24):3407-25 pubmed:17182867
  71. Genetics. 2009 Nov;183(3):1127-39 pubmed:19720862
  72. Genome Res. 2008 Dec;18(12):1924-37 pubmed:18836034
  73. Nucleic Acids Res. 2012 Jan;40(Database issue):D1178-86 pubmed:22110026
  74. Nucleic Acids Res. 2008 Jan;36(Database issue):D480-4 pubmed:18077471
  75. Ann Bot. 2007 Oct;100(4):875-88 pubmed:17684025
  76. Genet Epidemiol. 2004 Dec;27(4):348-64 pubmed:15543638
  77. Nucleic Acids Res. 2007 Jul;35(Web Server issue):W265-8 pubmed:17485477
  78. Nat Genet. 2012 Oct;44(10):1098-103 pubmed:22922876
  79. Genome Res. 2002 Jul;12(7):1075-9 pubmed:12097344
  80. Curr Opin Plant Biol. 2012 Feb;15(1):78-83 pubmed:21968124
  81. Bioinformatics. 2009 Aug 15;25(16):2078-9 pubmed:19505943
  82. Genome Res. 2003 Jan;13(1):91-6 pubmed:12529310
  83. Genome Biol. 2012 Aug 30;13(8):167 pubmed:22943138
  84. Genome Biol. 2012 Jun 15;13(6):R47 pubmed:22704043
  85. Bioinformatics. 2007 May 15;23(10):1289-91 pubmed:17379693
  86. J Mol Biol. 1990 Oct 5;215(3):403-10 pubmed:2231712
  87. Nat Protoc. 2007;2(9):2233-44 pubmed:17853881
  88. Nat Genet. 2011 Feb;43(2):109-16 pubmed:21186353
  89. Plant J. 1996 Oct;10(4):613-23 pubmed:8893540
  90. Nature. 2005 Aug 11;436(7052):793-800 pubmed:16100779

MeSH Term

Cacao
Chromosome Mapping
Chromosomes, Plant
Color
Fruit
Gene Expression Regulation, Plant
Genes, Plant
Genome Size
Genome, Plant
High-Throughput Nucleotide Sequencing
Quantitative Trait Loci
Quantitative Trait, Heritable
RNA, Small Interfering
Transcription Factors
Transcription, Genetic

Chemicals

RNA, Small Interfering
Transcription Factors