cloudSPAdes: assembly of synthetic long reads using de Bruijn graphs.

Ivan Tolstoganov, Anton Bankevich, Zhoutao Chen, Pavel A Pevzner
Author Information
  1. Ivan Tolstoganov: Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia.
  2. Anton Bankevich: Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA, USA.
  3. Zhoutao Chen: Universal Sequencing Technology Corporation, Carlsbad, CA, USA.
  4. Pavel A Pevzner: Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia.

Abstract

MOTIVATION: The recently developed barcoding-based synthetic long read (SLR) technologies have already found many applications in genome assembly and analysis. However, although some new barcoding protocols are emerging and the range of SLR applications is being expanded, the existing SLR assemblers are optimized for a narrow range of parameters and are not easily extendable to new barcoding technologies and new applications such as metagenomics or hybrid assembly.
RESULTS: We describe the algorithmic challenge of the SLR assembly and present a cloudSPAdes algorithm for SLR assembly that is based on analyzing the de Bruijn graph of SLRs. We benchmarked cloudSPAdes across various barcoding technologies/applications and demonstrated that it improves on the state-of-the-art SLR assemblers in accuracy and speed.
AVAILABILITY AND IMPLEMENTATION: Source code and installation manual for cloudSPAdes are available at https://github.com/ablab/spades/releases/tag/cloudspades-paper.
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

References

  1. J Comput Biol. 1999 Summer;6(2):237-52 [PMID: 10421525]
  2. J Comput Biol. 2012 May;19(5):455-77 [PMID: 22506599]
  3. Bioinformatics. 2013 Apr 15;29(8):1072-5 [PMID: 23422339]
  4. Elife. 2013 Jul 02;2:e00569 [PMID: 23840927]
  5. Genome Res. 2014 Dec;24(12):2041-9 [PMID: 25327137]
  6. Genome Res. 2015 Apr;25(4):534-43 [PMID: 25665577]
  7. Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45 [PMID: 26553804]
  8. Bioinformatics. 2016 Apr 1;32(7):1088-90 [PMID: 26614127]
  9. Nat Methods. 2016 Mar;13(3):248-50 [PMID: 26828418]
  10. Nat Biotechnol. 2016 Mar;34(3):303-11 [PMID: 26829319]
  11. IEEE/ACM Trans Comput Biol Bioinform. 2017 Mar-Apr;14(2):418-430 [PMID: 26887011]
  12. Sci Rep. 2016 May 31;6:26775 [PMID: 27240745]
  13. Bioinformatics. 2016 Jun 15;32(12):i216-i224 [PMID: 27307620]
  14. Genome Biol. 2016 Jun 20;17(1):132 [PMID: 27323842]
  15. Genome Res. 2017 May;27(5):824-834 [PMID: 28298430]
  16. Genome Res. 2017 May;27(5):722-736 [PMID: 28298431]
  17. Genome Res. 2017 May;27(5):757-767 [PMID: 28381613]
  18. J Microbiol Methods. 2017 Dec;143:78-86 [PMID: 29056447]
  19. Bioinformatics. 2018 Mar 1;34(5):725-731 [PMID: 29069293]
  20. PLoS One. 2018 Jan 9;13(1):e0190853 [PMID: 29315344]
  21. Front Microbiol. 2017 Dec 20;8:2594 [PMID: 29326684]
  22. Cell Syst. 2018 Aug 22;7(2):192-200.e3 [PMID: 30056005]
  23. Nat Biotechnol. 2018 Oct 15;: [PMID: 30320765]
  24. Genome Res. 2019 Jan;29(1):116-124 [PMID: 30523036]
  25. Genome Res. 2019 Apr;29(4):635-645 [PMID: 30894395]
  26. EMBO J. 1995 Oct 2;14(19):4893-903 [PMID: 7588618]

MeSH Term

Algorithms
Cloud Computing
High-Throughput Nucleotide Sequencing
Metagenomics
Sequence Analysis, DNA
Software

Word Cloud

Created with Highcharts 10.0.0SLRassemblyapplicationsnewbarcodingcloudSPAdessyntheticlongtechnologiesrangeassemblersdeBruijnavailableMOTIVATION:recentlydevelopedbarcoding-basedreadalreadyfoundmanygenomeanalysisHoweveralthoughprotocolsemergingexpandedexistingoptimizednarrowparameterseasilyextendablemetagenomicshybridRESULTS:describealgorithmicchallengepresentalgorithmbasedanalyzinggraphSLRsbenchmarkedacrossvarioustechnologies/applicationsdemonstratedimprovesstate-of-the-artaccuracyspeedAVAILABILITYANDIMPLEMENTATION:Sourcecodeinstallationmanualhttps://githubcom/ablab/spades/releases/tag/cloudspades-paperSUPPLEMENTARYINFORMATION:SupplementarydataBioinformaticsonlinecloudSPAdes:readsusinggraphs

Similar Articles

Cited By