NGSNGS: next-generation simulator for next-generation sequencing data.

Rasmus Amund Henriksen, Lei Zhao, Thorfinn Sand Korneliussen
Author Information
  1. Rasmus Amund Henriksen: Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, 1350 Copenhagen K, Denmark. ORCID
  2. Lei Zhao: Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, 1350 Copenhagen K, Denmark. ORCID
  3. Thorfinn Sand Korneliussen: Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, 1350 Copenhagen K, Denmark. ORCID

Abstract

SUMMARY: With the rapid expansion of the capabilities of the DNA sequencers throughout the different sequencing generations, the quantity of generated data has likewise increased. This evolution has also led to new bioinformatical methods, for which in silico data have become crucial when verifying the accuracy of a model or the robustness of a genomic analysis pipeline. Here, we present a multithreaded next-generation simulator for next-generation sequencing data (NGSNGS), which simulates reads faster than currently available methods and programs. NGSNGS can simulate reads with platform-specific characteristics based on nucleotide quality score profiles as well as including a post-mortem damage model which is relevant for simulating ancient DNA. The simulated sequences are sampled (with replacement) from a reference DNA genome, which can represent a haploid genome, polyploid assemblies or even population haplotypes and allows the user to simulate known variable sites directly. The program is implemented in a multithreading framework and is factors faster than currently available tools while extending their feature set and possible output formats.
AVAILABILITY AND IMPLEMENTATION: The method and associated programs are released as open-source software, code and user manual are available at https://github.com/RAHenriksen/NGSNGS.
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

References

  1. Bioinformatics. 2012 Feb 15;28(4):593-4 [PMID: 22199392]
  2. Bioinformatics. 2013 Jul 01;29(13):1682-4 [PMID: 23613487]
  3. Bioinformatics. 2017 Feb 15;33(4):577-579 [PMID: 27794556]
  4. Cold Spring Harb Perspect Biol. 2013 Jul 01;5(7): [PMID: 23729639]
  5. Genome Biol. 2010;11(5):R47 [PMID: 20441577]
  6. Proc Biol Sci. 2005 Jan 7;272(1558):3-16 [PMID: 15875564]
  7. Proc Natl Acad Sci U S A. 2007 Sep 11;104(37):14616-21 [PMID: 17715061]
  8. Hum Immunol. 2021 Nov;82(11):801-811 [PMID: 33745759]

Grants

  1. R302-2018-2155/Lundbeck Foundation Centre for Disease Evolution
  2. /Carlsberg Foundation in 2019

MeSH Term

Software
Genome
Genomics
High-Throughput Nucleotide Sequencing
DNA, Ancient
Sequence Analysis, DNA

Chemicals

DNA, Ancient

Word Cloud

Created with Highcharts 10.0.0datanext-generationavailableDNAsequencingmethodsmodelsimulatorNGSNGSreadsfastercurrentlyprogramscansimulategenomeuserSUMMARY:rapidexpansioncapabilitiessequencersthroughoutdifferentgenerationsquantitygeneratedlikewiseincreasedevolutionalsolednewbioinformaticalsilicobecomecrucialverifyingaccuracyrobustnessgenomicanalysispipelinepresentmultithreadedsimulatesplatform-specificcharacteristicsbasednucleotidequalityscoreprofileswellincludingpost-mortemdamagerelevantsimulatingancientsimulatedsequencessampledreplacementreferencerepresenthaploidpolyploidassembliesevenpopulationhaplotypesallowsknownvariablesitesdirectlyprogramimplementedmultithreadingframeworkfactorstoolsextendingfeaturesetpossibleoutputformatsAVAILABILITYANDIMPLEMENTATION:methodassociatedreleasedopen-sourcesoftwarecodemanualhttps://githubcom/RAHenriksen/NGSNGSSUPPLEMENTARYINFORMATION:SupplementaryBioinformaticsonlineNGSNGS:

Similar Articles

Cited By