Long-read assembly of the Brassica napus reference genome Darmor-bzh.

Mathieu Rousseau-Gueutin, Caroline Belser, Corinne Da Silva, Gautier Richard, Benjamin Istace, Corinne Cruaud, Cyril Falentin, Franz Boideau, Julien Boutte, Regine Delourme, Gwenaëlle Deniot, Stefan Engelen, Julie Ferreira de Carvalho, Arnaud Lemainque, Loeiz Maillet, Jérôme Morice, Patrick Wincker, France Denoeud, Anne-Marie Chèvre, Jean-Marc Aury
Author Information
  1. Mathieu Rousseau-Gueutin: IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France.
  2. Caroline Belser: Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France.
  3. Corinne Da Silva: Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France.
  4. Gautier Richard: IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France.
  5. Benjamin Istace: Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France.
  6. Corinne Cruaud: Genoscope, Institut François Jacob, Commissariat à l'Energie Atomique (CEA), Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France.
  7. Cyril Falentin: IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France.
  8. Franz Boideau: IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France.
  9. Julien Boutte: IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France.
  10. Regine Delourme: IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France.
  11. Gwenaëlle Deniot: IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France.
  12. Stefan Engelen: Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France.
  13. Julie Ferreira de Carvalho: IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France.
  14. Arnaud Lemainque: Genoscope, Institut François Jacob, Commissariat à l'Energie Atomique (CEA), Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France.
  15. Loeiz Maillet: IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France.
  16. Jérôme Morice: IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France.
  17. Patrick Wincker: Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France.
  18. France Denoeud: Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France.
  19. Anne-Marie Chèvre: IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France.
  20. Jean-Marc Aury: Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France.

Abstract

BACKGROUND: The combination of long reads and long-range information to produce genome assemblies is now accepted as a common standard. This strategy not only allows access to the gene catalogue of a given species but also reveals the architecture and organization of chromosomes, including complex regions such as telomeres and centromeres. The Brassica genus is not exempt, and many assemblies based on long reads are now available. The reference genome for Brassica napus, Darmor-bzh, which was published in 2014, was produced using short reads and its contiguity was extremely low compared with current assemblies of the Brassica genus.
FINDINGS: Herein, we report the new long-read assembly of Darmor-bzh genome (Brassica napus) generated by combining long-read sequencing data and optical and genetic maps. Using the PromethION device and 6 flowcells, we generated ∼16 million long reads representing 93× coverage and, more importantly, 6× with reads longer than 100 kb. This ultralong-read dataset allows us to generate one of the most contiguous and complete assemblies of a Brassica genome to date (contig N50 > 10 Mb). In addition, we exploited all the advantages of the nanopore technology to detect modified bases and sequence transcriptomic data using direct RNA to annotate the genome and focus on resistance genes.
CONCLUSION: Using these cutting-edge technologies, and in particular by relying on all the advantages of the nanopore technology, we provide the most contiguous Brassica napus assembly, a resource that will be valuable to the Brassica community for crop improvement and will facilitate the rapid selection of agronomically important traits.

Keywords

References

  1. Nat Methods. 2019 Dec;16(12):1297-1305 [PMID: 31740818]
  2. Ann Bot. 2005 Jan;95(1):229-35 [PMID: 15596470]
  3. Plant Biotechnol J. 2018 Jul;16(7):1265-1274 [PMID: 29205771]
  4. Sci Rep. 2020 Jul 24;10(1):12394 [PMID: 32709963]
  5. Comput Appl Biosci. 1997 Aug;13(4):477-8 [PMID: 9283765]
  6. Methods Mol Biol. 2019;1962:227-245 [PMID: 31020564]
  7. Elife. 2020 Jan 14;9: [PMID: 31931956]
  8. Sci Data. 2017 Aug 01;4:170093 [PMID: 28763055]
  9. Nat Plants. 2018 Oct;4(10):762-765 [PMID: 30287950]
  10. Plant Cell. 2017 Oct;29(10):2336-2348 [PMID: 29025960]
  11. Front Plant Sci. 2020 Nov 12;11:577536 [PMID: 33281844]
  12. Bioinformatics. 2018 Sep 15;34(18):3094-3100 [PMID: 29750242]
  13. J Mol Biol. 1990 Oct 5;215(3):403-10 [PMID: 2231712]
  14. Theor Appl Genet. 2018 Aug;131(8):1627-1643 [PMID: 29728747]
  15. Nat Plants. 2020 Aug;6(8):929-941 [PMID: 32782408]
  16. Front Plant Sci. 2020 Apr 28;11:496 [PMID: 32411167]
  17. Hortic Res. 2018 Aug 15;5:50 [PMID: 30131865]
  18. Nat Commun. 2018 Jan 15;9(1):189 [PMID: 29335486]
  19. J Comput Biol. 2006 Jun;13(5):1028-40 [PMID: 16796549]
  20. Gigascience. 2020 Dec 15;9(12): [PMID: 33319912]
  21. PeerJ. 2020 Nov 05;8:e10150 [PMID: 33194395]
  22. Genome Biol. 2004;5(2):R12 [PMID: 14759262]
  23. Genetics. 2016 Feb;202(2):513-23 [PMID: 26614742]
  24. Genome Res. 2004 May;14(5):988-95 [PMID: 15123596]
  25. BMC Genomics. 2016 Nov 2;17(1):852 [PMID: 27806688]
  26. Sci Rep. 2019 Oct 17;9(1):14908 [PMID: 31624302]
  27. Plant Biotechnol J. 2017 Dec;15(12):1602-1610 [PMID: 28403535]
  28. Nat Plants. 2020 Jan;6(1):34-45 [PMID: 31932676]
  29. Genome Res. 2016 Dec;26(12):1721-1729 [PMID: 27852649]
  30. Genome Res. 2017 May;27(5):737-746 [PMID: 28100585]
  31. Nucleic Acids Res. 1989 Mar 25;17(6):2362 [PMID: 2468132]
  32. BMC Genomics. 2013 Feb 22;14:120 [PMID: 23432809]
  33. Nat Biotechnol. 2019 May;37(5):540-546 [PMID: 30936562]
  34. Gigascience. 2017 Feb 1;6(2):1-13 [PMID: 28369459]
  35. Bioinformatics. 2005 Apr 15;21(8):1703-4 [PMID: 15598829]
  36. New Phytol. 2013 Jul;199(1):252-263 [PMID: 23551259]
  37. Plant Biotechnol J. 2020 Apr;18(4):969-982 [PMID: 31553100]
  38. Bioinformatics. 2004 Sep 22;20(14):2324-6 [PMID: 15059820]
  39. Science. 2014 Aug 22;345(6199):950-3 [PMID: 25146293]
  40. Bioinformatics. 2020 Dec 22;36(20):5000-5006 [PMID: 32910174]
  41. Genome Res. 2002 Apr;12(4):656-64 [PMID: 11932250]
  42. Nat Methods. 2020 Feb;17(2):155-158 [PMID: 31819265]
  43. Nat Biotechnol. 2018 Apr;36(4):338-345 [PMID: 29431738]
  44. Sci Rep. 2017 Dec 21;7(1):17986 [PMID: 29269833]
  45. J Hum Genet. 2020 Jan;65(1):25-33 [PMID: 31602005]
  46. PLoS One. 2014 Nov 19;9(11):e112963 [PMID: 25409509]
  47. Theor Appl Genet. 2016 Oct;129(10):1887-99 [PMID: 27364915]
  48. Bioinformatics. 2010 Mar 15;26(6):841-2 [PMID: 20110278]
  49. Nat Plants. 2018 Nov;4(11):879-887 [PMID: 30390080]
  50. J Genet. 2016 Dec;95(4):997-1001 [PMID: 27994200]

MeSH Term

Brassica napus
Genome
High-Throughput Nucleotide Sequencing
Nanopores
Phenotype

Word Cloud

Created with Highcharts 10.0.0BrassicagenomereadsassembliesnapusDarmor-bzhassemblynanoporelongnowallowsgenusreferenceusinglong-readgeneratedsequencingdataopticalUsingcontiguousadvantagestechnologydirectRNAwillBACKGROUND:combinationlong-rangeinformationproduceacceptedcommonstandardstrategyaccessgenecataloguegivenspeciesalsorevealsarchitectureorganizationchromosomesincludingcomplexregionstelomerescentromeresexemptmanybasedavailablepublished2014producedshortcontiguityextremelylowcomparedcurrentFINDINGS:HereinreportnewcombininggeneticmapsPromethIONdevice6flowcells∼16million longrepresenting93×coverageimportantlylonger100kbultralong-readdatasetusgenerateonecompletedatecontigN50> 10MbadditionexploiteddetectmodifiedbasessequencetranscriptomicannotatefocusresistancegenesCONCLUSION:cutting-edgetechnologiesparticularrelyingprovideresourcevaluablecommunitycropimprovementfacilitaterapidselectionagronomicallyimportanttraitsLong-readchromosome-scaleoilseedrapemapping

Similar Articles

Cited By