The haplotype-resolved chromosome pairs of a heterozygous diploid African cassava cultivar reveal novel pan-genome and allele-specific transcriptome features.
Weihong Qi, Yi-Wen Lim, Andrea Patrignani, Pascal Schl��pfer, Anna Bratus-Neuenschwander, Simon Gr��ter, Christelle Chanez, Nathalie Rodde, Elisa Prat, Sonia Vautrin, Margaux-Alison Fustier, Diogo Pratas, Ralph Schlapbach, Wilhelm Gruissem
Author Information
Weihong Qi: Functional Genomics Center Zurich, ETH Zurich and University of Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland. ORCID
Yi-Wen Lim: Department of Biology, Institute of Molecular Plant Biology, ETH Zurich, Universit��tstrasse 2, 8092, Zurich, Switzerland. ORCID
Andrea Patrignani: Functional Genomics Center Zurich, ETH Zurich and University of Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland. ORCID
Pascal Schl��pfer: Department of Biology, Institute of Molecular Plant Biology, ETH Zurich, Universit��tstrasse 2, 8092, Zurich, Switzerland. ORCID
Anna Bratus-Neuenschwander: Functional Genomics Center Zurich, ETH Zurich and University of Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland. ORCID
Simon Gr��ter: Functional Genomics Center Zurich, ETH Zurich and University of Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland. ORCID
Christelle Chanez: Department of Biology, Institute of Molecular Plant Biology, ETH Zurich, Universit��tstrasse 2, 8092, Zurich, Switzerland. ORCID
Nathalie Rodde: INRAE, CNRGV French Plant Genomic Resource Center, F-31320, Castanet Tolosan, France. ORCID
Elisa Prat: INRAE, CNRGV French Plant Genomic Resource Center, F-31320, Castanet Tolosan, France. ORCID
Sonia Vautrin: INRAE, CNRGV French Plant Genomic Resource Center, F-31320, Castanet Tolosan, France. ORCID
Margaux-Alison Fustier: INRAE, CNRGV French Plant Genomic Resource Center, F-31320, Castanet Tolosan, France. ORCID
Diogo Pratas: Department of Electronics, Telecommunications and Informatics and Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Campus Universit��rio de Santiago, 3810-193 Aveiro, Portugal. ORCID
Ralph Schlapbach: Functional Genomics Center Zurich, ETH Zurich and University of Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland. ORCID
Wilhelm Gruissem: Department of Biology, Institute of Molecular Plant Biology, ETH Zurich, Universit��tstrasse 2, 8092, Zurich, Switzerland. ORCID
BACKGROUND: Cassava (Manihot esculenta) is an important clonally propagated food crop in tropical and subtropical regions worldwide. Genetic gain by molecular breeding has been limited, partially because cassava is a highly heterozygous crop with a repetitive and difficult-to-assemble genome. FINDINGS: Here we demonstrate that Pacific Biosciences high-fidelity (HiFi) sequencing reads, in combination with the assembler hifiasm, produced genome assemblies at near complete haplotype resolution with higher continuity and accuracy compared to conventional long sequencing reads. We present 2 chromosome-scale haploid genomes phased with Hi-C technology for the diploid African cassava variety TME204. With consensus accuracy >QV46, contig N50 >18 Mb, BUSCO completeness of 99%, and 35k phased gene loci, it is the most accurate, continuous, complete, and haplotype-resolved cassava genome assembly so far. Ab initio gene prediction with RNA-seq data and Iso-Seq transcripts identified abundant novel gene loci, with enriched functionality related to chromatin organization, meristem development, and cell responses. During tissue development, differentially expressed transcripts of different haplotype origins were enriched for different functionality. In each tissue, 20-30% of transcripts showed allele-specific expression (ASE) differences. ASE bias was often tissue specific and inconsistent across different tissues. Direction-shifting was observed in <2% of the ASE transcripts. Despite high gene synteny, the HiFi genome assembly revealed extensive chromosome rearrangements and abundant intra-genomic and inter-genomic divergent sequences, with large structural variations mostly related to LTR retrotransposons. We use the reference-quality assemblies to build a cassava pan-genome and demonstrate its importance in representing the genetic diversity of cassava for downstream reference-guided omics analysis and breeding. CONCLUSIONS: The phased and annotated chromosome pairs allow a systematic view of the heterozygous diploid genome organization in cassava with improved accuracy, completeness, and haplotype resolution. They will be a valuable resource for cassava breeding and research. Our study may also provide insights into developing cost-effective and efficient strategies for resolving complex genomes with high resolution, accuracy, and continuity.