A single polyploidization event at the origin of the tetraploid genome of Coffea arabica is responsible for the extremely low genetic variation in wild and cultivated germplasm.

Simone Scalabrin, Lucile Toniutti, Gabriele Di Gaspero, Davide Scaglione, Gabriele Magris, Michele Vidotto, Sara Pinosio, Federica Cattonaro, Federica Magni, Irena Jurman, Mario Cerutti, Furio Suggi Liverani, Luciano Navarini, Lorenzo Del Terra, Gloria Pellegrino, Manuela Rosanna Ruosi, Nicola Vitulo, Giorgio Valle, Alberto Pallavicini, Giorgio Graziosi, Patricia E Klein, Nolan Bentley, Seth Murray, William Solano, Amin Al Hakimi, Timothy Schilling, Christophe Montagnon, Michele Morgante, Benoit Bertrand
Author Information
  1. Simone Scalabrin: IGA Technology Services S.r.l., via Jacopo Linussio 51, I-33100, Udine, Italy.
  2. Lucile Toniutti: World Coffee Research, 5 avenue du grand chêne, 34270, Saint-Mathieu-de-Tréviers, France. lucile@worldcoffeeresearch.org.
  3. Gabriele Di Gaspero: Istituto di Genomica Applicata, via Jacopo Linussio 51, I-33100, Udine, Italy.
  4. Davide Scaglione: IGA Technology Services S.r.l., via Jacopo Linussio 51, I-33100, Udine, Italy.
  5. Gabriele Magris: Istituto di Genomica Applicata, via Jacopo Linussio 51, I-33100, Udine, Italy.
  6. Michele Vidotto: IGA Technology Services S.r.l., via Jacopo Linussio 51, I-33100, Udine, Italy.
  7. Sara Pinosio: Istituto di Genomica Applicata, via Jacopo Linussio 51, I-33100, Udine, Italy.
  8. Federica Cattonaro: IGA Technology Services S.r.l., via Jacopo Linussio 51, I-33100, Udine, Italy.
  9. Federica Magni: IGA Technology Services S.r.l., via Jacopo Linussio 51, I-33100, Udine, Italy.
  10. Irena Jurman: Istituto di Genomica Applicata, via Jacopo Linussio 51, I-33100, Udine, Italy.
  11. Mario Cerutti: Luigi Lavazza S.p.A., Innovation Center, I-10156, Torino, Italy.
  12. Furio Suggi Liverani: Illycaffè S.p.A., Research & Innovation, via Flavia 110, I-34147, Trieste, Italy.
  13. Luciano Navarini: Illycaffè S.p.A., Research & Innovation, via Flavia 110, I-34147, Trieste, Italy.
  14. Lorenzo Del Terra: Illycaffè S.p.A., Research & Innovation, via Flavia 110, I-34147, Trieste, Italy.
  15. Gloria Pellegrino: Luigi Lavazza S.p.A., Innovation Center, I-10156, Torino, Italy.
  16. Manuela Rosanna Ruosi: Luigi Lavazza S.p.A., Innovation Center, I-10156, Torino, Italy.
  17. Nicola Vitulo: Department of Biotechnology, University of Verona, Verona, Italy.
  18. Giorgio Valle: CRIBI, Università degli Studi di Padova, viale G. Colombo 3, I-35121, Padova, Italy.
  19. Alberto Pallavicini: Department of Life Sciences, University of Trieste, I-34148, Trieste, Italy.
  20. Giorgio Graziosi: Department of Life Sciences, University of Trieste, I-34148, Trieste, Italy.
  21. Patricia E Klein: Department of Horticultural Sciences, Texas A&M University, College Station, TX, USA.
  22. Nolan Bentley: Department of Horticultural Sciences, Texas A&M University, College Station, TX, USA.
  23. Seth Murray: Department of Soil and Crop Sciences, Texas A&M University, College Station, TX, USA.
  24. William Solano: CATIE, Turrialba, Costa Rica.
  25. Amin Al Hakimi: Faculty of Agriculture, Sana'a University, Sana'a, Yemen.
  26. Timothy Schilling: World Coffee Research, 5 avenue du grand chêne, 34270, Saint-Mathieu-de-Tréviers, France.
  27. Christophe Montagnon: World Coffee Research, 5 avenue du grand chêne, 34270, Saint-Mathieu-de-Tréviers, France.
  28. Michele Morgante: Istituto di Genomica Applicata, via Jacopo Linussio 51, I-33100, Udine, Italy.
  29. Benoit Bertrand: CIRAD, IPME, 34 398, Montpellier, France.

Abstract

The genome of the allotetraploid species Coffea arabica L. was sequenced to assemble independently the two component subgenomes (putatively deriving from C. canephora and C. eugenioides) and to perform a genome-wide analysis of the genetic diversity in cultivated coffee germplasm and in wild populations growing in the center of origin of the species. We assembled a total length of 1.536 Gbp, 444 Mb and 527 Mb of which were assigned to the canephora and eugenioides subgenomes, respectively, and predicted 46,562 gene models, 21,254 and 22,888 of which were assigned to the canephora and to the eugeniodes subgenome, respectively. Through a genome-wide SNP genotyping of 736 C. arabica accessions, we analyzed the genetic diversity in the species and its relationship with geographic distribution and historical records. We observed a weak population structure due to low-frequency derived alleles and highly negative values of Taijma's D, suggesting a recent and severe bottleneck, most likely resulting from a single event of polyploidization, not only for the cultivated germplasm but also for the entire species. This conclusion is strongly supported by forward simulations of mutation accumulation. However, PCA revealed a cline of genetic diversity reflecting a west-to-east geographical distribution from the center of origin in East Africa to the Arabian Peninsula. The extremely low levels of variation observed in the species, as a consequence of the polyploidization event, make the exploitation of diversity within the species for breeding purposes less interesting than in most crop species and stress the need for introgression of new variability from the diploid progenitors.

References

  1. Lashermes, P. et al. Molecular characterisation and origin of the Coffea arabica L. genome. Mol. Gen. Genet. MGG. Springer 261, 259–66 (1999). doi:10.1007/s004380050965
  2. Cenci, A., Combes, M.-C. & Lashermes, P. Genome evolution in diploid and tetraploid Coffea species as revealed by comparative analysis of orthologous genome segments. Plant. Mol. Biol. 78, 135–45 (2012). pubmed:22086332; doi:10.1007/s11103-011-9852-3
  3. Yu, Q., Guyot, R., de Kochko, A. & Rafael, N.-P. Micro-collinearity and genome evolution in the vicinity of an ethylene receptor gene of cultivated diploid and allopolyploid coffee species (Coffea). Plant. J. 67, 305–17 (2011). pubmed:21457367; doi:10.1111/j.1365-313X.2011.04590.x
  4. Sylvain, P. G. Some observations on Coffea arabica L. in Ethiopia. Turrialba. 5, 37–53 (1955).
  5. Fernie, L., Greathead, D., Meyer, F. & Monaco, L., Narasimhaswamy, R. FAO coffee mission to Ethiopia, 1964–65. FAO (1968).
  6. Haarer, A. E. Modern Coffee production. Leonard Hill. (1958).
  7. Anthony, F. et al. The origin of cultivated Coffea arabica L. varieties revealed by AFLP and SSR markers; 894–900 (2002).
  8. Aga, E., Bryngelsson, T., Bekele, E. & Salomon, B. Genetic diversity of forest arabica coffee (Coffea arabica L.) in Ethiopia as revealed by random amplified polymorphic DNA (RAPD). Hereditas 138, 36–46 (2003). pubmed:12830983; doi:10.1034/j.1601-5223.2003.01636.x
  9. Tesfaye, K., Borsch, T., Govers, K. & Bekele, E. Characterization of Coffea chloroplast microsatellites and evidence for the recent divergence of C. arabica and C. eugenioides chloroplast genomes. Genome (2007).
  10. Merot-L’anthoene, V. et al. Development and evaluation of a genome-wide Coffee 8.5K SNP array and its application for high-density genetic mapping and for investigating the origin of Coffea arabica L. Plant Biotechnol J. (2019).
  11. Denoeud, F. et al. The coffee genome provides insight into the convergent evolution of caffeine biosynthesis. Science (80-). Am. Assoc. Advancement Sci. 345, 1181–4 (2014).
  12. Tran, H. T. M. et al. SNP in the Coffea arabica genome associated with coffee quality. Tree Genet Genomes (2018).
  13. Simpson, J. T. et al. ABySS: A parallel assembler for short read sequence data. Genome Res.1117–23 (2009). pubmed:19251739; pmcid:2694472; doi:10.1101/gr.089532.108
  14. Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. Genome analysis BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinforma. 31, 3210–2 (2015). doi:10.1093/bioinformatics/btv351
  15. Lashermes, P. et al. Exchanges and Homeologous Gene Silencing Shaped the Nascent Allopolyploid Coffee Genome (Coffea arabica L.). Genes|Genomes|Genetics 6, 2937–48 (2016). pubmed:27440920; pmcid:5015950; doi:10.1534/g3.116.030858
  16. Kelleher, J., Etheridge, A. M. & McVean, G. Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes. Song YS, editor. PLOS Comput Biol. Public Library of Science 12, e1004842 (2016). doi:10.1371/journal.pcbi.1004842
  17. Ossowski S et al. The rate and molecular spectrum of spontaneous mutations in arabidopsis thaliana. Science (80-) 2010.
  18. Garavito A., Montagnon C., Guyot R., Bertrand B. Identification by the DArTseq method of the genetic origin of the Coffea canephora cultivated in Vietnam and Mexico. BMC Plant Biol. BMC Plant Biology 1–12 (2016).
  19. Lander, E. S. & Waterman, M. S. Genomic mapping by fingerprinting random clones: A mathematical analysis. Genomics (1988).
  20. Churchill, G. A. & Waterman, M. S. The accuracy of DNA sequences: Estimating sequence quality. Genomics (1992).
  21. Gnerre, S. et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci USA (2011).
  22. Myers, E. W. Jr. A history of DNA sequence assembly. it - Inf Technol. (2016).
  23. Li, C., Lin, F., An, D., Wang, W. & Huang, R. Genome Sequencing and Assembly by Long Reads in Plants. Genes (Basel) 9 (2018).
  24. Shimizu, T. et al. Draft Sequencing of the Heterozygous Diploid Genome of Satsuma (Citrus unshiu Marc.) Using a Hybrid Assembly Approach. Front Genet. 8, 1–19 (2017). doi:10.3389/fgene.2017.00180
  25. Koren, S et al. De novo assembly of haplotype-resolved genomes with trio binning. Nat Biotechnol. Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved 36, 1174–82 (2018). doi:10.1038/nbt.4277
  26. Pryszcz, L. P. & Gabaldon, T. Redundans: an assembly pipeline for highly heterozygous genomes. Nucleic Acids Res. 1–10 (2016).
  27. Kajitani, R. et al. Platanus-allee is a de novo haplotype assembler enabling a comprehensive access to divergent heterozygous regions. Nat Commun. 10, 1–15 (2019). doi:10.1038/s41467-019-09575-2
  28. Kyriakidou, M., Tai, H. H., Anglin, N. L., Ellis, D. & Strömvik, M. V. Current Strategies of Polyploid Plant Genome Sequence Assembly. Front Plant Sci. 9, 1–15 (2018). doi:10.3389/fpls.2018.01660
  29. Doležel, J., Kubaláková, M., Cihalikova, J., Suchánková, P. & Šimková, H. Chromosome Analysis and Sorting Using Flow Cytometry. Methods Mol Biol. 701, 221–38 (2011). pubmed:21181533; doi:10.1007/978-1-61737-957-4_12; pmcid:21181533
  30. Haiminen, N., Feltus, F. A. & Parida, L. Assessing pooled BAC and whole genome shotgun strategies for assembly of complex genomes. BMC Genomics 12, 1–13 (2011). doi:10.1186/1471-2164-12-194
  31. Visendi, P. et al. An efficient approach to BAC based assembly of complex genomes. Plant Methods. BioMed Central 12, 1–9 (2016).
  32. Brosh, N. Coffee Culture. Jerusalem: Israel Museum, editor (2002).
  33. Pankhurst, R. The coffee ceremony and the history of coffee consumption in Ethiopia. Ethiop broader Perspect Pap XIIIth 18 Int Conf Ethiop Stud Kyoto, 12–17 December 1997. M. Shigeta. p. 516–39 (1997).
  34. Sylvain, P. G. Ethiopian Coffee–Its Significance to World Coffee Problems. Econ Bot. 111–39 (1958). doi:10.1007/BF02862767
  35. Bertrand, B., Aguilar, G., Santacreo, R. & Anzueto, F. El Mejoramiento Genetico En America Central. Desafios la caficultura en Centroam. B. Bertran. p. 407–56 (1999).
  36. Van Der Vossen, H. et al. Next generation variety development for sustainable production of arabica coffee (Coffea arabica L.): a review. Euphytica. 204, 243–56 (2015). doi:10.1007/s10681-015-1398-z
  37. Albrechtsen, A., Nielsen, F. C. & Nielsen, R. Ascertainment biases in SNP chips affect measures of population divergence. Mol Biol Evol. (2010).
  38. Lachance, J. & Tishkoff, S. A. SNP ascertainment bias in population genetic analyses: Why it is important, and how to correct it. BioEssays (2013).
  39. Gaeta, R. T., Pires, J. C., Iniguez-Luy, F., Leon, E. & Osborn, T. C. Genomic changes in resynthesized Brassica napus and their effect on gene expression and phenotype. Plant Cell. (2007).
  40. Lashermes, P., Trouslot, P., Anthony, F., Combes, M. C. & Charrier, A. Genetic diversity for RAPD markers between cultivated and wild accessions of Coffea arabica. Euphytica 87, 59–64 (1996). doi:10.1007/BF00022965
  41. Silvestrini, M. et al. Genetic diversity of a Coffea Germplasm Collection assessed by RAPD markers. Genet Resour Crop Evol. 55, 901–10 (2008). doi:10.1007/s10722-007-9295-5
  42. Labouisse, J. P., Bellachew, B., Kotecha, S. & Bertrand, B. Current status of coffee (Coffea arabica L.) genetic resources in Ethiopia: Implications for conservation. Genet Resour Crop Evol. 55, 1079–93 (2008). doi:10.1007/s10722-008-9361-7
  43. Davis, A. P. et al. High extinction risk for wild coffee species and implications for coffee sector sustainability. Sci Adv. 1–9 (2019).
  44. Bertrand, B. et al. Comparison of bean biochemical composition and beverage quality of Arabica hybrids involving Sudanese-Ethiopian origins with traditional varieties at various elevations in Central America. Tree Physiol. 26, 1239–48 (2006). pubmed:16740499; doi:10.1093/treephys/26.9.1239
  45. Hinze, L. L., Kresovich, S., Nason, J. D. & Lamkey, K. R. Population Genetic Diversity in a Maize Reciprocal Recurrent Selection Program Population Genetic Diversity in a Maize Reciprocal Recurrent Selection. Crop Sci. 45, 2435–42 (2005). doi:10.2135/cropsci2004.0662
  46. Clarindo, W. R., Carvalho, C. R., Caixeta, E. T. & Koehler, A. D. Following the track of “Híbrido de Timor” origin by cytogenetic and flow cytometry approaches. Genet Resour Crop Evol. (2013).
  47. Herrera, J. C. et al. Genomic relationships among different Timor hybrid (Coffea L.) accessions as revealed by SNP identification and RNA-seq analysis. Adv Intell Syst Comput. (2014).
  48. Del Fabbro, C. et al. Evaluation of Read Trimming Effects on Illumina NGS Data Analysis. PLoS One 8, 1–13 (2013).
  49. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10–2 (2011). doi:10.14806/ej.17.1.200
  50. Boetzer, M., Henkel, C. V., Jansen, H. J., Butler, D. & Pirovano, W. Scaffolding pre-assembled contigs using SSPACE Summary. Bioinformatics 27, 578–9 (2011). pubmed:21149342; doi:10.1093/bioinformatics/btq683; pmcid:21149342
  51. Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–70 (2011). pubmed:21217122; pmcid:3051319; doi:10.1093/bioinformatics/btr011
  52. Li, R. et al. The sequence and de novo assembly of the giant panda genome. Nature 463, 311–8 (2010). pubmed:20010809; doi:10.1038/nature08696; pmcid:20010809
  53. Wildhagen, H. et al. Genes and gene clusters related to genotype and drought-induced variation in saccharification potential, lignin content and wood anatomical traits in Populus nigra. Tree Physiol. 38, 320–39 (2018). pubmed:28541580; doi:10.1093/treephys/tpx054; pmcid:28541580
  54. Bolger, A. M., Lohse, M. & Usadel, B. Genome analysis Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–20 (2014). pubmed:4103590; pmcid:4103590; doi:10.1093/bioinformatics/btu170
  55. Kim, D., Langmead, B. & Salzberg, S. HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12, 357–60 (2015). pubmed:4655817; pmcid:4655817; doi:10.1038/nmeth.3317
  56. Pertea M et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved. 33, 290–5 (2015). pubmed:25690850; pmcid:4643835; doi:10.1038/nbt.3122
  57. Haas, B. J. et al. Open Access Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced. Genome Biol. 9 (2008).
  58. Stanke, M. & Morgenstern, B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res. 33, 465–7 (2005). doi:10.1093/nar/gki458
  59. Stanke, M., Schöffmann, O., Morgenstern, B. & Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 11, 1–11 (2006).
  60. Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 1–9 (2004). doi:10.1186/1471-2105-5-59
  61. Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–9 (2004). pubmed:15145805; doi:10.1093/bioinformatics/bth315; pmcid:15145805
  62. Lomsadze, A., Burns, P. D. & Borodovsky, M. Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm. Nucleic Acids Res. 42, 1–8 (2014). doi:10.1093/nar/gku557
  63. Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–9 (2006). pubmed:16731699; doi:10.1093/bioinformatics/btl158; pmcid:16731699
  64. Jones, P. et al. Sequence analysis InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–40 (2014). pubmed:24451626; pmcid:3998142; doi:10.1093/bioinformatics/btu031
  65. Conesa, A. et al. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21, 3674–6 (2005). pubmed:16081474; pmcid:16081474; doi:10.1093/bioinformatics/bti610
  66. Anthony, F., Berthaud, J., Guillaumet, J. L. & Lourd, M. Collecting wild coffea species in Kenya and Tanzania. Plant Genet Ressources Newsl. 69, 23–9 (1987).
  67. Elshire, R. J. et al. A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species. PLoS One 6, 1–10 (2011). doi:10.1371/journal.pone.0019379
  68. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–60 (2009). pubmed:19451168; pmcid:2705234; doi:10.1093/bioinformatics/btp324
  69. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–303 (2010). pubmed:20644199; pmcid:2928508; doi:10.1101/gr.107524.110
  70. Catchen, J., Hohenlohe, P. A., Bassham, S., Amores, A. & Cresko, W. A. Stacks: an analysis tool set for population genomics. Mol Ecol. 22, 3124–40 (2013). pubmed:23701397; pmcid:3936987; doi:10.1111/mec.12354
  71. Dray, S. & Dufour, A. The ade4 Package: Implementing the Duality Diagram for Ecologists. J. Stat Softw. 22 (2007).
  72. Pritchard, J. K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genet Soc Am. 155, 945–59 (2000).
  73. Paradis, E., Claude, J. & Strimmer, K. APE: Analyses of phylogenetics and evolution in R language. Bioinformatics 20, 289–90 (2004). pubmed:14734327; doi:10.1093/bioinformatics/btg412
  74. Kamvar, Z. N., Tabima, J. F. & Grünwald, N. J. Poppr: an R package for genetic analysis of populations with clonal, partially clonal, and/or sexual reproduction. PeerJ. 2 (2014).
  75. Pfeifer, B., Wittelsbu, U., Ramos-onsins, S. E. & Lercher, M. J. PopGenome: An Efficient Swiss Army Knife for Population Genomic Analyses in R. Mol Biol Evol. 31, 1929–36 (2014). pubmed:24739305; pmcid:4069620; doi:10.1093/molbev/msu136

MeSH Term

Coffea
Costa Rica
Crops, Agricultural
Genome Size
Genome, Plant
Polymorphism, Single Nucleotide
Tetraploidy
Whole Genome Sequencing
Yemen