Improvement of the banana "Musa acuminata" reference sequence using NGS data and semi-automated bioinformatics methods.

Guillaume Martin, Franc-Christophe Baurens, Gaëtan Droc, Mathieu Rouard, Alberto Cenci, Andrzej Kilian, Alex Hastie, Jaroslav Doležel, Jean-Marc Aury, Adriana Alberti, Françoise Carreel, Angélique D'Hont
Author Information
  1. Guillaume Martin: CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5, France.
  2. Franc-Christophe Baurens: CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5, France.
  3. Gaëtan Droc: CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5, France.
  4. Mathieu Rouard: Bioversity International, Parc Scientifique Agropolis II, 34397, Montpellier, Cedex 5, France.
  5. Alberto Cenci: Bioversity International, Parc Scientifique Agropolis II, 34397, Montpellier, Cedex 5, France.
  6. Andrzej Kilian: Diversity Arrays Technology, Yarralumla, Australian Capital Territory, 2600, Australia.
  7. Alex Hastie: BioNano Genomics, 9640 Towne Centre Drive, San Diego, CA, 92121, USA.
  8. Jaroslav Doležel: Institute of Experimental Botany, Centre of the Region Hana for Biotechnological and Agricultural Research, Šlechtitelů 31, CZ-78371, Olomouc, Czech Republic.
  9. Jean-Marc Aury: Commissariat à l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Cremieux, BP5706, 91057, Evry, France.
  10. Adriana Alberti: Commissariat à l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Cremieux, BP5706, 91057, Evry, France.
  11. Françoise Carreel: CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5, France.
  12. Angélique D'Hont: CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5, France. dhont@cirad.fr.

Abstract

BACKGROUND: Recent advances in genomics indicate functional significance of a majority of genome sequences and their long range interactions. As a detailed examination of genome organization and function requires very high quality genome sequence, the objective of this study was to improve reference genome assembly of banana (Musa acuminata).
RESULTS: We have developed a modular bioinformatics pipeline to improve genome sequence assemblies, which can handle various types of data. The pipeline comprises several semi-automated tools. However, unlike classical automated tools that are based on global parameters, the semi-automated tools proposed an expert mode for a user who can decide on suggested improvements through local compromises. The pipeline was used to improve the draft genome sequence of Musa acuminata. Genotyping by sequencing (GBS) of a segregating population and paired-end sequencing were used to detect and correct scaffold misassemblies. Long insert size paired-end reads identified scaffold junctions and fusions missed by automated assembly methods. GBS markers were used to anchor scaffolds to pseudo-molecules with a new bioinformatics approach that avoids the tedious step of marker ordering during genetic map construction. Furthermore, a genome map was constructed and used to assemble scaffolds into super scaffolds. Finally, a consensus gene annotation was projected on the new assembly from two pre-existing annotations. This approach reduced the total Musa scaffold number from 7513 to 1532 (i.e. by 80%), with an N50 that increased from 1.3 Mb (65 scaffolds) to 3.0 Mb (26 scaffolds). 89.5% of the assembly was anchored to the 11 Musa chromosomes compared to the previous 70%. Unknown sites (N) were reduced from 17.3 to 10.0%.
CONCLUSION: The release of the Musa acuminata reference genome version 2 provides a platform for detailed analysis of banana genome variation, function and evolution. Bioinformatics tools developed in this work can be used to improve genome sequence assemblies in other species.

Keywords

References

  1. Gigascience. 2012 Dec 27;1(1):18 [PMID: 23587118]
  2. J Exp Bot. 2015 Mar;66(5):1489-98 [PMID: 25563969]
  3. BMC Bioinformatics. 2005;6:31 [PMID: 15713233]
  4. Genetics. 1996 Dec;144(4):1883-91 [PMID: 8978071]
  5. Mol Biol Evol. 2014 Feb;31(2):448-54 [PMID: 24296661]
  6. Genome Res. 2012 Nov;22(11):2241-9 [PMID: 22800726]
  7. Database (Oxford). 2013;2013:bat035 [PMID: 23707967]
  8. Proc Natl Acad Sci U S A. 2013 Feb 19;110(8):2898-903 [PMID: 23382190]
  9. Plant J. 2003 Dec;36(5):720-30 [PMID: 14617072]
  10. Genome Res. 2009 Sep;19(9):1639-45 [PMID: 19541911]
  11. Bioinformatics. 2011 Dec 1;27(23):3259-65 [PMID: 21998153]
  12. Nat Methods. 2011 Jan;8(1):61-5 [PMID: 21102452]
  13. Curr Opin Biotechnol. 2014 Apr;26:31-7 [PMID: 24679255]
  14. New Phytol. 2014 May;202(3):986-1000 [PMID: 24716518]
  15. Nat Protoc. 2012 Jul;7(7):1260-84 [PMID: 22678431]
  16. Bioinformatics. 2010 Mar 1;26(5):589-95 [PMID: 20080505]
  17. Genome Biol. 2012;13(4):243 [PMID: 22546054]
  18. Nature. 2012 Aug 9;488(7410):213-7 [PMID: 22801500]
  19. BMC Bioinformatics. 2014;15:211 [PMID: 24950923]
  20. Plant Mol Biol. 2014 May;85(1-2):63-80 [PMID: 24570169]
  21. BMC Bioinformatics. 2010;11:345 [PMID: 20576136]
  22. Bioinformatics. 2010 Mar 15;26(6):841-2 [PMID: 20110278]
  23. G3 (Bethesda). 2014 Apr;4(4):669-79 [PMID: 24531727]
  24. J Mol Biol. 1990 Oct 5;215(3):403-10 [PMID: 2231712]
  25. Bioinformatics. 2015 Jun 15;31(12):i80-8 [PMID: 26072512]
  26. Nucleic Acids Res. 2014 Jan;42(Database issue):D1176-81 [PMID: 24174541]
  27. Nat Biotechnol. 2013 Feb;31(2):135-41 [PMID: 23263233]
  28. J Comput Biol. 2011 Nov;18(11):1681-91 [PMID: 21929371]
  29. Biopolymers. 2011 May;95(5):298-311 [PMID: 21207457]
  30. Genome Res. 2003 Feb;13(2):195-205 [PMID: 12566397]
  31. Genet Res (Camb). 2011 Oct;93(5):343-9 [PMID: 21878144]
  32. Front Genet. 2014 Jul 07;5:208 [PMID: 25071835]
  33. Bioinformatics. 2013 Feb 15;29(4):428-34 [PMID: 23274213]
  34. Philos Trans R Soc Lond B Biol Sci. 2014 Aug 5;369(1648). pii: 20130353. doi: 10.1098/rstb.2013.0353 [PMID: 24958926]
  35. Nat Methods. 2012 Apr;9(4):357-9 [PMID: 22388286]
  36. Trends Plant Sci. 2011 Feb;16(2):77-88 [PMID: 21081278]
  37. Bioinformatics. 2012 Jun 1;28(11):1429-37 [PMID: 22492642]
  38. Genome Biol. 2012;13(6):R56 [PMID: 22731987]
  39. Genome Res. 2004 Jan;14(1):149-59 [PMID: 14707177]
  40. Plant J. 2013 Nov;76(4):718-27 [PMID: 23998490]
  41. Bioinformatics. 2011 Feb 15;27(4):578-9 [PMID: 21149342]
  42. Nature. 2011 Feb 10;470(7333):198-203 [PMID: 21307932]
  43. Plant Cell. 2002 Mar;14(3):537-45 [PMID: 11910002]
  44. Science. 2009 Oct 9;326(5950):236-7 [PMID: 19815760]
  45. Nat Methods. 2015 Aug;12(8):780-6 [PMID: 26121404]
  46. Curr Opin Biotechnol. 2013 Aug;24(4):690-8 [PMID: 23428595]
  47. Plant J. 2015 Jan;81(1):108-20 [PMID: 25353668]

MeSH Term

Computational Biology
Contig Mapping
Genetic Markers
Genome, Plant
High-Throughput Nucleotide Sequencing
Molecular Sequence Annotation
Musa
Sequence Analysis, DNA

Chemicals

Genetic Markers

Word Cloud

Similar Articles

Cited By