The Sequence Read Archive: a decade more of explosive growth.

Kenneth Katz, Oleg Shutov, Richard Lapoint, Michael Kimelman, J Rodney Brister, Christopher O'Sullivan
Author Information
  1. Kenneth Katz: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA. ORCID
  2. Oleg Shutov: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
  3. Richard Lapoint: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
  4. Michael Kimelman: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
  5. J Rodney Brister: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
  6. Christopher O'Sullivan: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

Abstract

The Sequence Read Archive (SRA, https://www.ncbi.nlm.nih.gov/sra/) stores raw sequencing data and alignment information to enhance reproducibility and facilitate new discoveries through data analysis. Here we note changes in storage designed to increase access and highlight analyses that augment metadata with taxonomic insight to help users select data. In addition, we present three unanticipated applications of taxonomic analysis.

References

  1. Nucleic Acids Res. 2012 Jan;40(Database issue):D54-6 [PMID: 22009675]
  2. Database (Oxford). 2020 Jan 1;2020: [PMID: 32761142]
  3. PLoS One. 2013;8(3):e59190 [PMID: 23533605]
  4. Nucleic Acids Res. 2012 Jan;40(Database issue):D136-43 [PMID: 22139910]
  5. Sci Data. 2016 Mar 15;3:160018 [PMID: 26978244]
  6. Nucleic Acids Res. 2022 Jan 7;50(D1):D20-D26 [PMID: 34850941]
  7. Genome Biol. 2020 May 12;21(1):115 [PMID: 32398145]
  8. Nat Biotechnol. 2015 Mar;33(3):240-3 [PMID: 25748910]
  9. Mol Pathol. 2003 Feb;56(1):11-8 [PMID: 12560456]
  10. Genome Biol. 2021 Sep 20;22(1):270 [PMID: 34544477]
  11. Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45 [PMID: 26553804]

MeSH Term

Bacteria
Base Sequence
Databases, Genetic
High-Throughput Nucleotide Sequencing
Internet
Metadata
Phylogeny
Reproducibility of Results
SARS-CoV-2
Sequence Analysis, RNA
Software
Viruses

Word Cloud

Created with Highcharts 10.0.0dataSequenceReadanalysistaxonomicArchiveSRAhttps://wwwncbinlmnihgov/sra/storesrawsequencingalignmentinformationenhancereproducibilityfacilitatenewdiscoveriesnotechangesstoragedesignedincreaseaccesshighlightanalysesaugmentmetadatainsighthelpusersselectadditionpresentthreeunanticipatedapplicationsArchive:decadeexplosivegrowth

Similar Articles

Cited By