The haplotype-resolved chromosome pairs of a heterozygous diploid African cassava cultivar reveal novel pan-genome and allele-specific transcriptome features.

Weihong Qi, Yi-Wen Lim, Andrea Patrignani, Pascal Schl��pfer, Anna Bratus-Neuenschwander, Simon Gr��ter, Christelle Chanez, Nathalie Rodde, Elisa Prat, Sonia Vautrin, Margaux-Alison Fustier, Diogo Pratas, Ralph Schlapbach, Wilhelm Gruissem
Author Information
  1. Weihong Qi: Functional Genomics Center Zurich, ETH Zurich and University of Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland. ORCID
  2. Yi-Wen Lim: Department of Biology, Institute of Molecular Plant Biology, ETH Zurich, Universit��tstrasse 2, 8092, Zurich, Switzerland. ORCID
  3. Andrea Patrignani: Functional Genomics Center Zurich, ETH Zurich and University of Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland. ORCID
  4. Pascal Schl��pfer: Department of Biology, Institute of Molecular Plant Biology, ETH Zurich, Universit��tstrasse 2, 8092, Zurich, Switzerland. ORCID
  5. Anna Bratus-Neuenschwander: Functional Genomics Center Zurich, ETH Zurich and University of Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland. ORCID
  6. Simon Gr��ter: Functional Genomics Center Zurich, ETH Zurich and University of Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland. ORCID
  7. Christelle Chanez: Department of Biology, Institute of Molecular Plant Biology, ETH Zurich, Universit��tstrasse 2, 8092, Zurich, Switzerland. ORCID
  8. Nathalie Rodde: INRAE, CNRGV French Plant Genomic Resource Center, F-31320, Castanet Tolosan, France. ORCID
  9. Elisa Prat: INRAE, CNRGV French Plant Genomic Resource Center, F-31320, Castanet Tolosan, France. ORCID
  10. Sonia Vautrin: INRAE, CNRGV French Plant Genomic Resource Center, F-31320, Castanet Tolosan, France. ORCID
  11. Margaux-Alison Fustier: INRAE, CNRGV French Plant Genomic Resource Center, F-31320, Castanet Tolosan, France. ORCID
  12. Diogo Pratas: Department of Electronics, Telecommunications and Informatics and Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Campus Universit��rio de Santiago, 3810-193 Aveiro, Portugal. ORCID
  13. Ralph Schlapbach: Functional Genomics Center Zurich, ETH Zurich and University of Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland. ORCID
  14. Wilhelm Gruissem: Department of Biology, Institute of Molecular Plant Biology, ETH Zurich, Universit��tstrasse 2, 8092, Zurich, Switzerland. ORCID

Abstract

BACKGROUND: Cassava (Manihot esculenta) is an important clonally propagated food crop in tropical and subtropical regions worldwide. Genetic gain by molecular breeding has been limited, partially because cassava is a highly heterozygous crop with a repetitive and difficult-to-assemble genome.
FINDINGS: Here we demonstrate that Pacific Biosciences high-fidelity (HiFi) sequencing reads, in combination with the assembler hifiasm, produced genome assemblies at near complete haplotype resolution with higher continuity and accuracy compared to conventional long sequencing reads. We present 2 chromosome-scale haploid genomes phased with Hi-C technology for the diploid African cassava variety TME204. With consensus accuracy >QV46, contig N50 >18 Mb, BUSCO completeness of 99%, and 35k phased gene loci, it is the most accurate, continuous, complete, and haplotype-resolved cassava genome assembly so far. Ab initio gene prediction with RNA-seq data and Iso-Seq transcripts identified abundant novel gene loci, with enriched functionality related to chromatin organization, meristem development, and cell responses. During tissue development, differentially expressed transcripts of different haplotype origins were enriched for different functionality. In each tissue, 20-30% of transcripts showed allele-specific expression (ASE) differences. ASE bias was often tissue specific and inconsistent across different tissues. Direction-shifting was observed in <2% of the ASE transcripts. Despite high gene synteny, the HiFi genome assembly revealed extensive chromosome rearrangements and abundant intra-genomic and inter-genomic divergent sequences, with large structural variations mostly related to LTR retrotransposons. We use the reference-quality assemblies to build a cassava pan-genome and demonstrate its importance in representing the genetic diversity of cassava for downstream reference-guided omics analysis and breeding.
CONCLUSIONS: The phased and annotated chromosome pairs allow a systematic view of the heterozygous diploid genome organization in cassava with improved accuracy, completeness, and haplotype resolution. They will be a valuable resource for cassava breeding and research. Our study may also provide insights into developing cost-effective and efficient strategies for resolving complex genomes with high resolution, accuracy, and continuity.

Keywords

References

  1. Nat Plants. 2019 Sep;5(9):965-979 [PMID: 31506640]
  2. Hortic Res. 2021 Aug 5;8(1):188 [PMID: 34354050]
  3. Ann Hum Genet. 2020 Mar;84(2):125-140 [PMID: 31711268]
  4. Genetics. 2003 Aug;164(4):1635-44 [PMID: 12930767]
  5. Nat Biotechnol. 2019 Oct;37(10):1155-1162 [PMID: 31406327]
  6. Genome Biol. 2019 Jul 25;20(1):144 [PMID: 31345254]
  7. Bioinformatics. 2016 Oct 1;32(19):3021-3 [PMID: 27318204]
  8. PLoS Genet. 2019 Aug 30;15(8):e1008373 [PMID: 31469821]
  9. Mol Biol Evol. 2020 Dec 16;37(12):3507-3524 [PMID: 32681796]
  10. Bioinformatics. 2012 Dec 1;28(23):3150-2 [PMID: 23060610]
  11. Nat Genet. 2021 Aug;53(8):1250-1259 [PMID: 34267370]
  12. Genome Res. 2020 Sep;30(9):1291-1305 [PMID: 32801147]
  13. Genome Res. 2002 Aug;12(8):1269-76 [PMID: 12176934]
  14. Plant Physiol. 2018 Feb;176(2):1410-1422 [PMID: 29233850]
  15. Nat Biotechnol. 2018 Dec 6;36(12):1121 [PMID: 30520871]
  16. Nat Plants. 2020 Aug;6(8):914-920 [PMID: 32690893]
  17. G3 (Bethesda). 2014 Dec 11;5(1):133-44 [PMID: 25504737]
  18. Gigascience. 2022 Mar 24;11: [PMID: 35333302]
  19. Plant Physiol. 2014 Feb;164(2):513-24 [PMID: 24306534]
  20. Hortic Res. 2019 Oct 8;6:112 [PMID: 31645966]
  21. Nucleic Acids Res. 2012 Apr;40(7):e49 [PMID: 22217600]
  22. Genome Res. 2012 Mar;22(3):549-56 [PMID: 22156294]
  23. Mol Biol Evol. 2013 Apr;30(4):772-80 [PMID: 23329690]
  24. Nat Biotechnol. 2016 May;34(5):562-70 [PMID: 27088722]
  25. Nat Genet. 2019 Mar;51(3):541-547 [PMID: 30804557]
  26. Genome Biol. 2014;15(12):550 [PMID: 25516281]
  27. Nat Biotechnol. 2016 May;34(5):525-7 [PMID: 27043002]
  28. Nat Genet. 2020 Dec;52(12):1423-1432 [PMID: 33139952]
  29. Front Biosci (Landmark Ed). 2017 Jan 1;22(6):1023-1032 [PMID: 27814661]
  30. Biology (Basel). 2012 Sep 18;1(2):439-59 [PMID: 24832233]
  31. Trop Plant Biol. 2012 Mar;5(1):88-94 [PMID: 22523606]
  32. Genome Biol. 2020 Sep 14;21(1):245 [PMID: 32928274]
  33. New Phytol. 2017 Mar;213(4):1632-1641 [PMID: 28116755]
  34. Genome Biol. 2021 Apr 23;22(1):119 [PMID: 33892774]
  35. Nucleic Acids Res. 1999 Jan 15;27(2):573-80 [PMID: 9862982]
  36. Nat Genet. 2020 Oct;52(10):1018-1023 [PMID: 32989320]
  37. Nature. 2021 Apr;592(7856):737-746 [PMID: 33911273]
  38. Proc Natl Acad Sci U S A. 2020 Apr 28;117(17):9451-9457 [PMID: 32300014]
  39. Genome Biol. 2021 Jan 4;22(1):3 [PMID: 33397434]
  40. Nat Biotechnol. 2018 Oct 22;: [PMID: 30346939]
  41. Genetics. 1996 Sep;144(1):427-37 [PMID: 8878706]
  42. Methods Mol Biol. 2019;1962:227-245 [PMID: 31020564]
  43. Bioinformatics. 2014 May 1;30(9):1228-35 [PMID: 24443382]
  44. Genome Res. 2011 Dec;21(12):2224-41 [PMID: 21926179]
  45. Ecol Evol. 2013 Jun;3(6):1569-79 [PMID: 23789068]
  46. Virus Res. 2014 Jun 24;186:87-96 [PMID: 24389096]
  47. Mol Plant. 2021 Jun 7;14(6):851-854 [PMID: 33866024]
  48. Curr Protoc Bioinformatics. 2019 Mar;65(1):e57 [PMID: 30466165]
  49. Curr Opin Plant Biol. 2020 Apr;54:26-33 [PMID: 31981929]
  50. BMC Biol. 2019 Sep 18;17(1):75 [PMID: 31533702]
  51. Nat Commun. 2017 Jan 24;8:14061 [PMID: 28117401]
  52. Nat Methods. 2021 Feb;18(2):170-175 [PMID: 33526886]
  53. Nucleic Acids Res. 2019 Dec 2;47(21):10994-11006 [PMID: 31584084]
  54. PLoS Comput Biol. 2019 Aug 21;15(8):e1007273 [PMID: 31433799]
  55. Nat Biotechnol. 2021 Mar;39(3):302-308 [PMID: 33288906]
  56. Genome Res. 2002 Apr;12(4):656-64 [PMID: 11932250]
  57. Genome Biol. 2020 Oct 16;21(1):265 [PMID: 33066802]
  58. Bioinformatics. 2013 Apr 15;29(8):1072-5 [PMID: 23422339]
  59. Nat Commun. 2014 Oct 10;5:5110 [PMID: 25300236]
  60. PLoS Comput Biol. 2018 Jan 26;14(1):e1005944 [PMID: 29373581]
  61. IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):645-56 [PMID: 24091398]
  62. Plant Biotechnol J. 2004 May;2(3):181-8 [PMID: 17147609]
  63. Genome Biol. 2015 Jan 13;16:3 [PMID: 25583564]
  64. BMC Res Notes. 2010 Aug 11;3:225 [PMID: 20701751]
  65. F1000Res. 2017 Feb 3;6:100 [PMID: 28868132]
  66. Bioinformatics. 2021 Jul 19;37(12):1639-1643 [PMID: 33320174]
  67. Nucleic Acids Res. 2012 Jan;40(Database issue):D1178-86 [PMID: 22110026]
  68. Nat Methods. 2016 Dec;13(12):1050-1054 [PMID: 27749838]
  69. Bioinformatics. 2017 Sep 15;33(18):2938-2940 [PMID: 28645171]
  70. Proc Natl Acad Sci U S A. 2013 Feb 19;110(8):2898-903 [PMID: 23382190]
  71. F1000Res. 2015 Nov 20;4:1310 [PMID: 26835000]
  72. Nat Genet. 2012 Jan 08;44(2):226-32 [PMID: 22231483]
  73. Trends Genet. 2018 Sep;34(9):666-681 [PMID: 29941292]
  74. Nat Methods. 2013 Jun;10(6):563-9 [PMID: 23644548]
  75. Gigascience. 2020 May 1;9(5): [PMID: 32432328]
  76. Bioinformatics. 2018 Sep 15;34(18):3094-3100 [PMID: 29750242]

MeSH Term

Alleles
Chromosomes
Diploidy
Haplotypes
Manihot
Plant Breeding
Sequence Analysis, DNA
Transcriptome

Word Cloud

Created with Highcharts 10.0.0cassavagenomehaplotypeaccuracyphasedgenetranscriptschromosomebreedingheterozygousresolutiondiploidtissuedifferentallele-specificASEpan-genomepairscropdemonstrateHiFisequencingreadsassembliescompletecontinuitygenomesAfricancompletenesslocihaplotype-resolvedassemblyabundantnovelenrichedfunctionalityrelatedorganizationdevelopmentexpressionhighBACKGROUND:CassavaManihotesculentaimportantclonallypropagatedfoodtropicalsubtropicalregionsworldwideGeneticgainmolecularlimitedpartiallyhighlyrepetitivedifficult-to-assembleFINDINGS:PacificBioscienceshigh-fidelitycombinationassemblerhifiasmproducednearhighercomparedconventionallongpresent2chromosome-scalehaploidHi-CtechnologyvarietyTME204consensus>QV46contigN50>18MbBUSCO99%35kaccuratecontinuousfarAbinitiopredictionRNA-seqdataIso-Seqidentifiedchromatinmeristemcellresponsesdifferentiallyexpressedorigins20-30%showeddifferencesbiasoftenspecificinconsistentacrosstissuesDirection-shiftingobserved<2%Despitesyntenyrevealedextensiverearrangementsintra-genomicinter-genomicdivergentsequenceslargestructuralvariationsmostlyLTRretrotransposonsusereference-qualitybuildimportancerepresentinggeneticdiversitydownstreamreference-guidedomicsanalysisCONCLUSIONS:annotatedallowsystematicviewimprovedwillvaluableresourceresearchstudymayalsoprovideinsightsdevelopingcost-effectiveefficientstrategiesresolvingcomplexcultivarrevealtranscriptomefeaturesheterozygosity

Similar Articles

Cited By