A single polyploidization event at the origin of the tetraploid genome of Coffea arabica is responsible for the extremely low genetic variation in wild and cultivated germplasm.
Simone Scalabrin, Lucile Toniutti, Gabriele Di Gaspero, Davide Scaglione, Gabriele Magris, Michele Vidotto, Sara Pinosio, Federica Cattonaro, Federica Magni, Irena Jurman, Mario Cerutti, Furio Suggi Liverani, Luciano Navarini, Lorenzo Del Terra, Gloria Pellegrino, Manuela Rosanna Ruosi, Nicola Vitulo, Giorgio Valle, Alberto Pallavicini, Giorgio Graziosi, Patricia E Klein, Nolan Bentley, Seth Murray, William Solano, Amin Al Hakimi, Timothy Schilling, Christophe Montagnon, Michele Morgante, Benoit Bertrand
Author Information
Simone Scalabrin: IGA Technology Services S.r.l., via Jacopo Linussio 51, I-33100, Udine, Italy.
Lucile Toniutti: World Coffee Research, 5 avenue du grand chêne, 34270, Saint-Mathieu-de-Tréviers, France. lucile@worldcoffeeresearch.org.
Gabriele Di Gaspero: Istituto di Genomica Applicata, via Jacopo Linussio 51, I-33100, Udine, Italy.
Davide Scaglione: IGA Technology Services S.r.l., via Jacopo Linussio 51, I-33100, Udine, Italy.
Gabriele Magris: Istituto di Genomica Applicata, via Jacopo Linussio 51, I-33100, Udine, Italy.
Michele Vidotto: IGA Technology Services S.r.l., via Jacopo Linussio 51, I-33100, Udine, Italy.
Sara Pinosio: Istituto di Genomica Applicata, via Jacopo Linussio 51, I-33100, Udine, Italy.
Federica Cattonaro: IGA Technology Services S.r.l., via Jacopo Linussio 51, I-33100, Udine, Italy.
Federica Magni: IGA Technology Services S.r.l., via Jacopo Linussio 51, I-33100, Udine, Italy.
Irena Jurman: Istituto di Genomica Applicata, via Jacopo Linussio 51, I-33100, Udine, Italy.
Mario Cerutti: Luigi Lavazza S.p.A., Innovation Center, I-10156, Torino, Italy.
Furio Suggi Liverani: Illycaffè S.p.A., Research & Innovation, via Flavia 110, I-34147, Trieste, Italy.
Luciano Navarini: Illycaffè S.p.A., Research & Innovation, via Flavia 110, I-34147, Trieste, Italy.
Lorenzo Del Terra: Illycaffè S.p.A., Research & Innovation, via Flavia 110, I-34147, Trieste, Italy.
Gloria Pellegrino: Luigi Lavazza S.p.A., Innovation Center, I-10156, Torino, Italy.
Manuela Rosanna Ruosi: Luigi Lavazza S.p.A., Innovation Center, I-10156, Torino, Italy.
Nicola Vitulo: Department of Biotechnology, University of Verona, Verona, Italy.
Giorgio Valle: CRIBI, Università degli Studi di Padova, viale G. Colombo 3, I-35121, Padova, Italy.
Alberto Pallavicini: Department of Life Sciences, University of Trieste, I-34148, Trieste, Italy.
Giorgio Graziosi: Department of Life Sciences, University of Trieste, I-34148, Trieste, Italy.
Patricia E Klein: Department of Horticultural Sciences, Texas A&M University, College Station, TX, USA.
Nolan Bentley: Department of Horticultural Sciences, Texas A&M University, College Station, TX, USA.
Seth Murray: Department of Soil and Crop Sciences, Texas A&M University, College Station, TX, USA.
William Solano: CATIE, Turrialba, Costa Rica.
Amin Al Hakimi: Faculty of Agriculture, Sana'a University, Sana'a, Yemen.
Timothy Schilling: World Coffee Research, 5 avenue du grand chêne, 34270, Saint-Mathieu-de-Tréviers, France.
Christophe Montagnon: World Coffee Research, 5 avenue du grand chêne, 34270, Saint-Mathieu-de-Tréviers, France.
Michele Morgante: Istituto di Genomica Applicata, via Jacopo Linussio 51, I-33100, Udine, Italy.
Benoit Bertrand: CIRAD, IPME, 34 398, Montpellier, France.
The genome of the allotetraploid species Coffea arabica L. was sequenced to assemble independently the two component subgenomes (putatively deriving from C. canephora and C. eugenioides) and to perform a genome-wide analysis of the genetic diversity in cultivated coffee germplasm and in wild populations growing in the center of origin of the species. We assembled a total length of 1.536 Gbp, 444 Mb and 527 Mb of which were assigned to the canephora and eugenioides subgenomes, respectively, and predicted 46,562 gene models, 21,254 and 22,888 of which were assigned to the canephora and to the eugeniodes subgenome, respectively. Through a genome-wide SNP genotyping of 736 C. arabica accessions, we analyzed the genetic diversity in the species and its relationship with geographic distribution and historical records. We observed a weak population structure due to low-frequency derived alleles and highly negative values of Taijma's D, suggesting a recent and severe bottleneck, most likely resulting from a single event of polyploidization, not only for the cultivated germplasm but also for the entire species. This conclusion is strongly supported by forward simulations of mutation accumulation. However, PCA revealed a cline of genetic diversity reflecting a west-to-east geographical distribution from the center of origin in East Africa to the Arabian Peninsula. The extremely low levels of variation observed in the species, as a consequence of the polyploidization event, make the exploitation of diversity within the species for breeding purposes less interesting than in most crop species and stress the need for introgression of new variability from the diploid progenitors.
References
Lashermes, P. et al. Molecular characterisation and origin of the Coffea arabica L. genome. Mol. Gen. Genet. MGG. Springer 261, 259–66 (1999).
[DOI: 10.1007/s004380050965]
Cenci, A., Combes, M.-C. & Lashermes, P. Genome evolution in diploid and tetraploid Coffea species as revealed by comparative analysis of orthologous genome segments. Plant. Mol. Biol. 78, 135–45 (2012).
[PMID: 22086332]
Yu, Q., Guyot, R., de Kochko, A. & Rafael, N.-P. Micro-collinearity and genome evolution in the vicinity of an ethylene receptor gene of cultivated diploid and allopolyploid coffee species (Coffea). Plant. J. 67, 305–17 (2011).
[PMID: 21457367]
Sylvain, P. G. Some observations on Coffea arabica L. in Ethiopia. Turrialba. 5, 37–53 (1955).
Fernie, L., Greathead, D., Meyer, F. & Monaco, L., Narasimhaswamy, R. FAO coffee mission to Ethiopia, 1964–65. FAO (1968).
Haarer, A. E. Modern Coffee production. Leonard Hill. (1958).
Anthony, F. et al. The origin of cultivated Coffea arabica L. varieties revealed by AFLP and SSR markers; 894–900 (2002).
Aga, E., Bryngelsson, T., Bekele, E. & Salomon, B. Genetic diversity of forest arabica coffee (Coffea arabica L.) in Ethiopia as revealed by random amplified polymorphic DNA (RAPD). Hereditas 138, 36–46 (2003).
[PMID: 12830983]
Tesfaye, K., Borsch, T., Govers, K. & Bekele, E. Characterization of Coffea chloroplast microsatellites and evidence for the recent divergence of C. arabica and C. eugenioides chloroplast genomes. Genome (2007).
Merot-L’anthoene, V. et al. Development and evaluation of a genome-wide Coffee 8.5K SNP array and its application for high-density genetic mapping and for investigating the origin of Coffea arabica L. Plant Biotechnol J. (2019).
Denoeud, F. et al. The coffee genome provides insight into the convergent evolution of caffeine biosynthesis. Science (80-). Am. Assoc. Advancement Sci. 345, 1181–4 (2014).
Tran, H. T. M. et al. SNP in the Coffea arabica genome associated with coffee quality. Tree Genet Genomes (2018).
Simpson, J. T. et al. ABySS: A parallel assembler for short read sequence data. Genome Res.1117–23 (2009).
[PMID: 19251739]
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. Genome analysis BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinforma. 31, 3210–2 (2015).
[DOI: 10.1093/bioinformatics/btv351]
Lashermes, P. et al. Exchanges and Homeologous Gene Silencing Shaped the Nascent Allopolyploid Coffee Genome (Coffea arabica L.). Genes|Genomes|Genetics 6, 2937–48 (2016).
[PMID: 27440920]
Kelleher, J., Etheridge, A. M. & McVean, G. Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes. Song YS, editor. PLOS Comput Biol. Public Library of Science 12, e1004842 (2016).
[DOI: 10.1371/journal.pcbi.1004842]
Ossowski S et al. The rate and molecular spectrum of spontaneous mutations in arabidopsis thaliana. Science (80-) 2010.
Garavito A., Montagnon C., Guyot R., Bertrand B. Identification by the DArTseq method of the genetic origin of the Coffea canephora cultivated in Vietnam and Mexico. BMC Plant Biol. BMC Plant Biology 1–12 (2016).
Lander, E. S. & Waterman, M. S. Genomic mapping by fingerprinting random clones: A mathematical analysis. Genomics (1988).
Churchill, G. A. & Waterman, M. S. The accuracy of DNA sequences: Estimating sequence quality. Genomics (1992).
Gnerre, S. et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci USA (2011).
Myers, E. W. Jr. A history of DNA sequence assembly. it - Inf Technol. (2016).
Li, C., Lin, F., An, D., Wang, W. & Huang, R. Genome Sequencing and Assembly by Long Reads in Plants. Genes (Basel) 9 (2018).
Shimizu, T. et al. Draft Sequencing of the Heterozygous Diploid Genome of Satsuma (Citrus unshiu Marc.) Using a Hybrid Assembly Approach. Front Genet. 8, 1–19 (2017).
[DOI: 10.3389/fgene.2017.00180]
Koren, S et al. De novo assembly of haplotype-resolved genomes with trio binning. Nat Biotechnol. Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved 36, 1174–82 (2018).
[DOI: 10.1038/nbt.4277]
Pryszcz, L. P. & Gabaldon, T. Redundans: an assembly pipeline for highly heterozygous genomes. Nucleic Acids Res. 1–10 (2016).
Kajitani, R. et al. Platanus-allee is a de novo haplotype assembler enabling a comprehensive access to divergent heterozygous regions. Nat Commun. 10, 1–15 (2019).
[DOI: 10.1038/s41467-019-09575-2]
Kyriakidou, M., Tai, H. H., Anglin, N. L., Ellis, D. & Strömvik, M. V. Current Strategies of Polyploid Plant Genome Sequence Assembly. Front Plant Sci. 9, 1–15 (2018).
[DOI: 10.3389/fpls.2018.01660]
Doležel, J., Kubaláková, M., Cihalikova, J., Suchánková, P. & Šimková, H. Chromosome Analysis and Sorting Using Flow Cytometry. Methods Mol Biol. 701, 221–38 (2011).
[PMID: 21181533]
Haiminen, N., Feltus, F. A. & Parida, L. Assessing pooled BAC and whole genome shotgun strategies for assembly of complex genomes. BMC Genomics 12, 1–13 (2011).
[DOI: 10.1186/1471-2164-12-194]
Visendi, P. et al. An efficient approach to BAC based assembly of complex genomes. Plant Methods. BioMed Central 12, 1–9 (2016).
Brosh, N. Coffee Culture. Jerusalem: Israel Museum, editor (2002).
Pankhurst, R. The coffee ceremony and the history of coffee consumption in Ethiopia. Ethiop broader Perspect Pap XIIIth 18 Int Conf Ethiop Stud Kyoto, 12–17 December 1997. M. Shigeta. p. 516–39 (1997).
Sylvain, P. G. Ethiopian Coffee–Its Significance to World Coffee Problems. Econ Bot. 111–39 (1958).
[DOI: 10.1007/BF02862767]
Bertrand, B., Aguilar, G., Santacreo, R. & Anzueto, F. El Mejoramiento Genetico En America Central. Desafios la caficultura en Centroam. B. Bertran. p. 407–56 (1999).
Van Der Vossen, H. et al. Next generation variety development for sustainable production of arabica coffee (Coffea arabica L.): a review. Euphytica. 204, 243–56 (2015).
[DOI: 10.1007/s10681-015-1398-z]
Albrechtsen, A., Nielsen, F. C. & Nielsen, R. Ascertainment biases in SNP chips affect measures of population divergence. Mol Biol Evol. (2010).
Lachance, J. & Tishkoff, S. A. SNP ascertainment bias in population genetic analyses: Why it is important, and how to correct it. BioEssays (2013).
Gaeta, R. T., Pires, J. C., Iniguez-Luy, F., Leon, E. & Osborn, T. C. Genomic changes in resynthesized Brassica napus and their effect on gene expression and phenotype. Plant Cell. (2007).
Lashermes, P., Trouslot, P., Anthony, F., Combes, M. C. & Charrier, A. Genetic diversity for RAPD markers between cultivated and wild accessions of Coffea arabica. Euphytica 87, 59–64 (1996).
[DOI: 10.1007/BF00022965]
Silvestrini, M. et al. Genetic diversity of a Coffea Germplasm Collection assessed by RAPD markers. Genet Resour Crop Evol. 55, 901–10 (2008).
[DOI: 10.1007/s10722-007-9295-5]
Labouisse, J. P., Bellachew, B., Kotecha, S. & Bertrand, B. Current status of coffee (Coffea arabica L.) genetic resources in Ethiopia: Implications for conservation. Genet Resour Crop Evol. 55, 1079–93 (2008).
[DOI: 10.1007/s10722-008-9361-7]
Davis, A. P. et al. High extinction risk for wild coffee species and implications for coffee sector sustainability. Sci Adv. 1–9 (2019).
Bertrand, B. et al. Comparison of bean biochemical composition and beverage quality of Arabica hybrids involving Sudanese-Ethiopian origins with traditional varieties at various elevations in Central America. Tree Physiol. 26, 1239–48 (2006).
[PMID: 16740499]
Hinze, L. L., Kresovich, S., Nason, J. D. & Lamkey, K. R. Population Genetic Diversity in a Maize Reciprocal Recurrent Selection Program Population Genetic Diversity in a Maize Reciprocal Recurrent Selection. Crop Sci. 45, 2435–42 (2005).
[DOI: 10.2135/cropsci2004.0662]
Clarindo, W. R., Carvalho, C. R., Caixeta, E. T. & Koehler, A. D. Following the track of “Híbrido de Timor” origin by cytogenetic and flow cytometry approaches. Genet Resour Crop Evol. (2013).
Herrera, J. C. et al. Genomic relationships among different Timor hybrid (Coffea L.) accessions as revealed by SNP identification and RNA-seq analysis. Adv Intell Syst Comput. (2014).
Del Fabbro, C. et al. Evaluation of Read Trimming Effects on Illumina NGS Data Analysis. PLoS One 8, 1–13 (2013).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10–2 (2011).
[DOI: 10.14806/ej.17.1.200]
Boetzer, M., Henkel, C. V., Jansen, H. J., Butler, D. & Pirovano, W. Scaffolding pre-assembled contigs using SSPACE Summary. Bioinformatics 27, 578–9 (2011).
[PMID: 21149342]
Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–70 (2011).
[PMID: 21217122]
Li, R. et al. The sequence and de novo assembly of the giant panda genome. Nature 463, 311–8 (2010).
[PMID: 20010809]
Wildhagen, H. et al. Genes and gene clusters related to genotype and drought-induced variation in saccharification potential, lignin content and wood anatomical traits in Populus nigra. Tree Physiol. 38, 320–39 (2018).
[PMID: 28541580]
Bolger, A. M., Lohse, M. & Usadel, B. Genome analysis Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–20 (2014).
[PMID: 4103590]
Kim, D., Langmead, B. & Salzberg, S. HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12, 357–60 (2015).
[PMID: 4655817]
Pertea M et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved. 33, 290–5 (2015).
[PMID: 25690850]
Haas, B. J. et al. Open Access Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced. Genome Biol. 9 (2008).
Stanke, M. & Morgenstern, B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res. 33, 465–7 (2005).
[DOI: 10.1093/nar/gki458]
Stanke, M., Schöffmann, O., Morgenstern, B. & Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 11, 1–11 (2006).
Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 1–9 (2004).
[DOI: 10.1186/1471-2105-5-59]
Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–9 (2004).
[PMID: 15145805]
Lomsadze, A., Burns, P. D. & Borodovsky, M. Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm. Nucleic Acids Res. 42, 1–8 (2014).
[DOI: 10.1093/nar/gku557]
Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–9 (2006).
[PMID: 16731699]
Jones, P. et al. Sequence analysis InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–40 (2014).
[PMID: 24451626]
Conesa, A. et al. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21, 3674–6 (2005).
[PMID: 16081474]
Anthony, F., Berthaud, J., Guillaumet, J. L. & Lourd, M. Collecting wild coffea species in Kenya and Tanzania. Plant Genet Ressources Newsl. 69, 23–9 (1987).
Elshire, R. J. et al. A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species. PLoS One 6, 1–10 (2011).
[DOI: 10.1371/journal.pone.0019379]
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–60 (2009).
[PMID: 19451168]
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–303 (2010).
[PMID: 20644199]
Catchen, J., Hohenlohe, P. A., Bassham, S., Amores, A. & Cresko, W. A. Stacks: an analysis tool set for population genomics. Mol Ecol. 22, 3124–40 (2013).
[PMID: 23701397]
Dray, S. & Dufour, A. The ade4 Package: Implementing the Duality Diagram for Ecologists. J. Stat Softw. 22 (2007).
Pritchard, J. K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genet Soc Am. 155, 945–59 (2000).
Paradis, E., Claude, J. & Strimmer, K. APE: Analyses of phylogenetics and evolution in R language. Bioinformatics 20, 289–90 (2004).
[PMID: 14734327]
Kamvar, Z. N., Tabima, J. F. & Grünwald, N. J. Poppr: an R package for genetic analysis of populations with clonal, partially clonal, and/or sexual reproduction. PeerJ. 2 (2014).
Pfeifer, B., Wittelsbu, U., Ramos-onsins, S. E. & Lercher, M. J. PopGenome: An Efficient Swiss Army Knife for Population Genomic Analyses in R. Mol Biol Evol. 31, 1929–36 (2014).
[PMID: 24739305]