The GSA Handbook (Version 2.0, June 2017) containing detailed data items' descriptions is freely available here.
The GSA Submission Quick Start Guide (Version 2.0, June 2017) containing submission descriptions is freely available here.
Designed for compatibility, Genome Sequence Archive (GSA) follows INSDC data standardsand structures. All data are organized into four objects,i.e., BioProject, BioSample, Experiment, and Run (Figure 1). "BioProject", bearing an accession number prefixed with "PRJC", providesan overall description for an individual research initiative, including basic description, organism, data type, submitter, funding information, and publication(s) if available.
Figure 1: Data model in GSA
Followings are examples of metadata. Submitters can organize meta data objects flexibly.
Comparative genome sequencing of three strains (paired-end) Include paired-end read files in a Run.
Figure 2: Comparative genome sequencing of three strains (paired-end)
Technical and biological replicates.
Figure 3: Technical and biological replicates
To create a submission, users need to register and log into theGenome Sequence Submission (Gsub) System. In order to maximally simplify the submission procedure, GSA is equipped with a user-friendly input wizard for metadata collection. To ease sequence file uploading, GSA provides a FTP server supporting two Internet Protocols (IPv4 and IPv6).
Figure 4: Graphic illustration of data submissions to GSA
GSA is shorten for Genome Sequence Archive, a data repository for genome, transcriptome and other omics primitive sequencing data. It archives raw sequence data produced from a wide variety of sequencing platforms. GSA is one of database resources in BIG Data Center (BIGD), part of Beijing Institute of Genomics (BIG), Chinese Academy of Sciences (CAS), serving as a primary archive of genome sequencing data for worldwide institutions and laboratories.
Only registered users can submit data using Genome Sequence submission (Gsub) System. Briefly, data submission requires the following steps.
a) Create a BIGD account and/or login to Gsub;
b) Enter metadata information;
c) Submit data files;
d) Specify the release date.
Any user can freely register and create a Gsub account. After your registration data is submitted, a confirmation email will be automatically sent to you for activating your account.
♦ If you just have forgotten your password, you may find the password by clicking “Forgot password”. You will receive an e-mail and please follow the URL to reset your password within 30 minutes.
♦ If you are already a member and you’ve forgotten both your GSA username and password, please feel free to contact us. We will do our best to help you.
Data submission requires that you log into Genome Sequence Submission (Gsub) System, so you need to create an account if you are not a member.
Please note that fields marked * are required when submitting metadata.
In the current version 2.0 of GSA, it is highly recommended that you submit your files using a dedicated FTP tool (e.g., FileZilla). Please transmit you data files to the GSA FTP site using the following credentials:
Address: ftp://subhra.cncb.ac.cn
User: Same as you login the Gsub
Password: Same as you login the Gsub
Please NOTE that you should create a unique folder on the FTP server.
In the current version, we recommend that read data is either submitted in FASTQ or BAM format. And GSA only accepts GZIP and BZIP2 compression formats (and DOES NOT accepts 7-ZIP, RAR or TAR). In addition, GSA does not accept multiplexed data.
The data files are submitted in FASTQ format, listed in a Run and merged into one or several sequence archive file (please do not exceed 10 GB). Therefore, data files from different samples or replicates should not be grouped in the same Run. Single reads must be submitted using a single archive file and can be named with the suffix appended, like '1', '_2', etc. Paired-end data files (forward/reverse), conversely, MUST be listed in a single run in order. For example, forward and reverse reads are alternate in the file and are named in order with "F" and "R" appended, respectively (i.e., read "1F", followed by read "1R", then read "2F", then "2R").
All submitted files will be regularly moved from FTP to a staging area for processing. Thus, it is quite normal that files "disappear" from FTP. If files succeed in passing the process, they will be made public or controlled access according to their release date set by users.
MD5 checksums are used to verify the integrity of transmitted data. An MD5 checksum is a 32-character alphanumeric string like "e3b5dd475c449300dd11f258538ff494".
♦ For Linux users, use: $ md5sum
♦ For Mac users, use: $ md5
♦ Windows users need to use a third-party tool.
When you submit data, you will find a button named "Set release date" at the bottom of web page. After you specify the release date, it will trigger or extend the data release according to the inputted date. It is suggested that you set the release date of Experiment/Run later than BioProject or BioSample.
The raw sequence data reported in this paper have been deposited in the Genome Sequence Archive (Genomics, Proteomics & Bioinformatics 2017) in BIG Data Center (Nucleic Acids Res 2017), Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, under accession numbers PRJCAxxxxxx, PRJCAyyyyyy that are publicly accessible at https://ngdc.cncb.ac.cn/gsa. Please cite the following required publications.
GSA: Genome Sequence Archive. Genomics, Proteomics & Bioinformatics 2017, 15(1): 14-18. [PMID=28387199]
The BIG Data Center: from deposition to integration to translation. Nucleic Acids Res 2017, 45(D1): D18-D24. [ PMID=27899658]
If you have any question or would like to give us any suggestion/comment or report a bug, please feel free to contact us via email (gsa@big.ac.cn) or Instant Messaging Software (QQ Group: 548170081).
We are also happy if you would like to have a visit to explore the possibility for collaboration or learn more about GSA,
Address:
BIG Data Center
Beijing Institute of Genomics, Chinese Academy of Sciences
No.1 Beichen West Road, Chaoyang District
Beijing 100101, China
Tel: +86 (10) 8409-7340
Fax: +86 (10) 8409-7720