GenBase: A Nucleotide Sequence Database.

Congfan Bu, Xinchang Zheng, Xuetong Zhao, Tianyi Xu, Xue Bai, Yaokai Jia, Meili Chen, Lili Hao, Jingfa Xiao, Zhang Zhang, Wenming Zhao, Bixia Tang, Yiming Bao
Author Information
  1. Congfan Bu: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China. ORCID
  2. Xinchang Zheng: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China. ORCID
  3. Xuetong Zhao: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China. ORCID
  4. Tianyi Xu: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China. ORCID
  5. Xue Bai: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China. ORCID
  6. Yaokai Jia: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China. ORCID
  7. Meili Chen: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China. ORCID
  8. Lili Hao: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China. ORCID
  9. Jingfa Xiao: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China. ORCID
  10. Zhang Zhang: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China. ORCID
  11. Wenming Zhao: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China. ORCID
  12. Bixia Tang: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China. ORCID
  13. Yiming Bao: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China. ORCID

Abstract

The rapid advancement of sequencing technologies poses challenges in managing the large volume and exponential growth of sequence data efficiently and on time. To address this issue, we present GenBase (https://ngdc.cncb.ac.cn/genbase), an open-access data repository that follows the International Nucleotide Sequence Database Collaboration (INSDC) data standards and structures, for efficient nucleotide sequence archiving, searching, and sharing. As a core resource within the National Genomics Data Center (NGDC) of the China National Center for Bioinformation (CNCB; https://ngdc.cncb.ac.cn), GenBase offers bilingual submission pipeline and services, as well as local submission assistance in China. GenBase also provides a unique Excel format for metadata description and feature annotation of nucleotide sequences, along with a real-time data validation system to streamline sequence submissions. As of April 23, 2024, GenBase received 68,251 nucleotide sequences and 689,574 annotated protein sequences across 414 species from 2319 submissions. Out of these, 63,614 (93%) nucleotide sequences and 620,640 (90%) annotated protein sequences have been released and are publicly accessible through GenBase's web search system, File Transfer Protocol (FTP), and Application Programming Interface (API). Additionally, in collaboration with INSDC, GenBase has constructed an effective data exchange mechanism with GenBank and started sharing released nucleotide sequences. Furthermore, GenBase integrates all sequences from GenBank with daily updates, demonstrating its commitment to actively contributing to global sequence data management and sharing.

Keywords

References

  1. Zool Res. 2019 Dec 11;41(1):70-77 [PMID: 31840951]
  2. Biochimie. 2020 Dec;179:85-100 [PMID: 32971147]
  3. Nucleic Acids Res. 2023 Jan 6;51(D1):D29-D38 [PMID: 36370100]
  4. Front Immunol. 2019 Sep 03;10:2070 [PMID: 31552029]
  5. Nucleic Acids Res. 2021 Jan 8;49(D1):D121-D124 [PMID: 33166387]
  6. Nucleic Acids Res. 2023 Jan 6;51(D1):D101-D105 [PMID: 36420889]
  7. Nat Rev Microbiol. 2021 Mar;19(3):141-154 [PMID: 33024307]
  8. BMC Bioinformatics. 2020 May 24;21(1):211 [PMID: 32448124]
  9. Genomics Proteomics Bioinformatics. 2021 Aug;19(4):578-583 [PMID: 34400360]
  10. J Insect Sci. 2020 Jan 1;20(1): [PMID: 31925425]
  11. Zookeys. 2020 Jan 16;904:63-87 [PMID: 31997890]
  12. Genomics Proteomics Bioinformatics. 2020 Dec;18(6):749-759 [PMID: 33704069]
  13. Euro Surveill. 2017 Mar 30;22(13): [PMID: 28382917]
  14. Nucleic Acids Res. 2023 Jan 6;51(D1):D141-D144 [PMID: 36350640]
  15. Nucleic Acids Res. 2024 Jan 5;52(D1):D18-D32 [PMID: 38018256]
  16. Genomics Proteomics Bioinformatics. 2023 Oct;21(5):900-903 [PMID: 37832784]
  17. Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W5-9 [PMID: 18440982]
  18. Nucleic Acids Res. 2023 Jan 6;51(D1):D9-D17 [PMID: 36477213]

MeSH Term

Databases, Nucleic Acid
Humans
Software
Genomics
Molecular Sequence Annotation
Computational Biology

Word Cloud

Similar Articles

Cited By