iSeq: an integrated tool to fetch public sequencing data.

Haoyu Chao, Zhuojin Li, Dijun Chen, Ming Chen
Author Information
  1. Haoyu Chao: Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China. ORCID
  2. Zhuojin Li: Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China. ORCID
  3. Dijun Chen: Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China. ORCID
  4. Ming Chen: Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China. ORCID

Abstract

MOTIVATION: High-throughput sequencing technologies [next-generation sequencing (NGS)] are increasingly used to address diverse biological questions. Despite the rich information in NGS data, particularly with the growing datasets from repositories like the Genome Sequence Archive (GSA) at NGDC, programmatic access to public sequencing data and metadata remains limited.
RESULTS: We developed iSeq to enable quick and straightforward retrieval of metadata and NGS data from multiple databases via the command-line interface. iSeq supports simultaneous retrieval from GSA, SRA, ENA, and DDBJ databases. It handles over 25 different accession formats, supports Aspera downloads, parallel downloads, multi-threaded processes, FASTQ file merging, and integrity verification, simplifying data acquisition and enhancing the capacity for reanalyzing NGS data.
AVAILABILITY AND IMPLEMENTATION: iSeq is freely available on Bioconda (https://anaconda.org/bioconda/iseq) and GitHub (https://github.com/BioOmics/iSeq).

References

  1. ESMO Open. 2016 Nov 18;1(5):e000094 [PMID: 27933214]
  2. Nat Biotechnol. 2020 Mar;38(3):276-278 [PMID: 32055031]
  3. Nucleic Acids Res. 2021 Jan 8;49(D1):D121-D124 [PMID: 33166387]
  4. F1000Res. 2019 Apr 23;8:532 [PMID: 31114675]
  5. Bioinformatics. 2023 Jan 1;39(1): [PMID: 36610997]
  6. Nat Methods. 2018 Jul;15(7):475-476 [PMID: 29967506]
  7. Genomics Proteomics Bioinformatics. 2021 Aug;19(4):578-583 [PMID: 34400360]
  8. Trends Plant Sci. 2022 Apr;27(4):391-401 [PMID: 34782248]
  9. Brief Bioinform. 2021 Mar 22;22(2):616-630 [PMID: 33279989]

Grants

  1. 2023YFE0112300/National Key Research and Development Program of China
  2. 32070677/National Natural Sciences Foundation of China

MeSH Term

Public Sector
Molecular Sequence Data
High-Throughput Nucleotide Sequencing
Sequence Analysis
Databases, Genetic

Word Cloud

Created with Highcharts 10.0.0datasequencingNGSiSeqGSApublicmetadataretrievaldatabasessupportsdownloadsMOTIVATION:High-throughputtechnologies[next-generation]increasinglyusedaddressdiversebiologicalquestionsDespiterichinformationparticularlygrowingdatasetsrepositorieslikeGenomeSequenceArchiveNGDCprogrammaticaccessremainslimitedRESULTS:developedenablequickstraightforwardmultipleviacommand-lineinterfacesimultaneousSRAENADDBJhandles25differentaccessionformatsAsperaparallelmulti-threadedprocessesFASTQfilemergingintegrityverificationsimplifyingacquisitionenhancingcapacityreanalyzingAVAILABILITYANDIMPLEMENTATION:freelyavailableBiocondahttps://anacondaorg/bioconda/iseqGitHubhttps://githubcom/BioOmics/iSeqiSeq:integratedtoolfetch

Similar Articles

Cited By