Data stream dataset of SARS-CoV-2 genome.

Raquel de M Barbosa, Marcelo A C Fernandes
Author Information
  1. Raquel de M Barbosa: Laboratory of Drug Development, Department of Pharmacy, Federal University of Rio Grande do Norte, Natal, RN59078-970, Brazil.
  2. Marcelo A C Fernandes: Laboratory of Machine Learning and Intelligent Instrumentation, IMD/nPITI, Federal University of Rio Grande do Norte, Natal 59078-970, Brazil.

Abstract

As of May 25, 2020, the novel coronavirus disease (called COVID-19) spread to more than 185 countries/regions with more than 348,000 deaths and more than 5,550,000 confirmed cases. In the bioinformatics area, one of the crucial points is the analysis of the virus nucleotide sequences using approaches such as data stream techniques and algorithms. However, to make feasible this approach, it is necessary to transform the nucleotide sequences string to numerical stream representation. Thus, the dataset provides four kinds of data stream representation (DSR) of SARS-CoV-2 virus nucleotide sequences. The dataset provides the DSR of 1557 instances of SARS-CoV-2 virus, 11540 other instances of other viruses from the Virus-Host DB dataset, and three instances of Riboviria viruses from NCBI (Betacoronavirus RaTG13, bat-SL-CoVZC45, and bat-SL-CoVZXC21).

Keywords

References

  1. Brief Bioinform. 2014 May;15(3):419-30 [PMID: 24197932]
  2. Nucleic Acids Res. 1990 Apr 25;18(8):2163-70 [PMID: 2336393]
  3. Bioinformatics. 2017 Feb 15;33(4):574-576 [PMID: 27797770]
  4. Genome Biol. 2009;10(10):R108 [PMID: 19814784]
  5. PLoS One. 2020 Apr 24;15(4):e0232391 [PMID: 32330208]
  6. J Comput Biol. 2019 Feb;26(2):143-151 [PMID: 30517021]
  7. Viruses. 2016 Mar 01;8(3):66 [PMID: 26938550]
  8. Data Brief. 2020 Apr 25;30:105618 [PMID: 32341946]

Word Cloud

Created with Highcharts 10.0.0streamdatasetSARS-CoV-2virusnucleotidesequencesinstancesCOVID-19000datarepresentationprovidesDSRvirusesDataMay252020novelcoronavirusdiseasecalledspread185countries/regions348deaths5550confirmedcasesbioinformaticsareaonecrucialpointsanalysisusingapproachestechniquesalgorithmsHowevermakefeasibleapproachnecessarytransformstringnumericalThusfourkinds155711540Virus-HostDBthreeRiboviriaNCBIBetacoronavirusRaTG13bat-SL-CoVZC45bat-SL-CoVZXC21genome

Similar Articles

Cited By (2)