GSA – Genome Sequence Archive

Credit by: Yanqing Wang

The Genome Sequence Archive (GSA; https://ngdc.cncb.ac.cn/gsa) is a data repository for archiving raw sequence data. In compliance with data standards and structures of the International Nucleotide Sequence Database Collaboration (INSDC), GSA adopts four data objects (BioProject, BioSample, Experiment, and Run) for data organization, accepts raw sequence reads produced by a variety of sequencing platforms, stores both sequence reads and metadata submitted from all over the world, and makes all these data publicly available to the global scientific community. In the era of big data, GSA is not only an important complement to existing INSDC members by alleviating the increasing burdens of handling sequence data deluge, but also takes the significant responsibility for global big data archive and provides free unrestricted access to all publicly available data in support of research activities throughout the world.

GSA-final

Video

Publication

The Genome Sequence Archive Family: Toward Explosive Data Growth and Diverse Data Types. Genomics, Proteomics & Bioinformatics 2021, 19(4):578-583. https://doi.org/10.1016/j.gpb.2021.08.001 [PMID=34400360]