The Genome Sequence Archive (GSA) is a data repository for archiving raw sequence reads. It accepts data submissions from all over the world and provides free access to all publicly available data for global scientific communities.
Login to the BIG Submission Portal (BIG Sub, https://bigd.big.ac.cn/gsub/): Click the ‘login’ tab, then login. If you do not have an account yet, click the ‘Register’ tab to create one. If you have any problems about your account, please contact bigd-admin@big.ac.cn for assistance.
Notice: After logged in to the BIG Submission Portal, you can follow the steps below to finish the submission.
The BIG Sub provides a browser-based user interface for submitting GSA metadata as well as various options for uploading data files.
The page tabs presented by the Submission wizard are:
Notice:If you have already created GSA related Biological Sample(s) in the BioSample database, please select the ‘GSA related BioSample information has been created’. Then follow the wizard to complete the submission.
If you have not created GSA related Biological Sample(s), please select the No GSA related BioSample information was created’. Then follow the wizard to complete the submission.
Notice:If you select ‘Release immediately following curation’, the records will be released after the approval passed. If you select Release on a specified date, the GSA will be released on the date you specify.
Notice:If you determine that your human data must be submitted via GSA Human database, please delete your current submission and contact us at gsa@big.ac.cn.
1) Download the BioSample submission template table Plant.us.xlsx. For column explanations and examples, please see the e.g.Plant.us.xlsx. For more information, please see the Help.
Notice:Downloading new template ensures that you get the most current and correct version.
2) Fill in the template table and double-check it before uploading. Use the Selection box to select your completed table.
3) Then click the ‘Check’ button to verify the submitted batch information online.
4) If the file has passed the examination, please click the ‘Save and forward’ button to complete your submission. If not, please click the ‘Delete’ button. You should edit and re-upload the file until it is correct.
1) Download the GSA submission template table GSA_Template.us.xlsx. For column explanations and examples, please see the e.g.GSA_Template.us.xlsxFor more information, please see the Help;
Notice:Downloading new template ensures that you get the most current and correct version
2) Fill in the template table and double-check it before uploading. Use the Selection box to select your completed table.
3) Then click the ‘Check’ button to verify the submitted batch information online.
4) If the file has passed the examination, please click the ‘Save and forward’ button to go to the next step of the submission. If not, please click the ‘Delete’ button. You should edit and re-upload the file until it is correct.
Notice:
1) Please remember to check the names and MD5 checksums of the sequence files, which must be the same as those you filled in the batch submission table. Otherwise, your files cannot be archived correctly.
2) If you choose the Aspera Command Line to upload files, please write down the Aspera Command Line information.
After completing the submission, please wait for data curation. We will check both metadata and the sequence files and send feedbacks to your registered Email if they are not perfectly correct. So, please pay attention to your mail feedbacks. After the curation, your data will be archived to a single GSA set and the assigned accession number will be shown in your GSA list.
Notice:
1) Each new submission receives a temporary Submission ID in the form of sub#, like subCRA019091. Please provide this ID when contacting the GSA Working Team. DO NOT use the temporary Submission ID in the publication or BIG Search.
2) After the submission, you will get the GSA Accession numbers in the form of CRA#, like CRA012226. Please use this number in a publication or BIG Search
Before the GSA data are archived, you can click the Submission ID to enter the Overview page. On this page, you can 1) update the Release date and Title; 2) edit the Submitter information; 3) Append data by clicking the ‘Add Data’ button, for more information, please see ‘Create new GSA Submission’; 4) edit or delete Metadata information for each submitted Experiment or Run; and 5) upload or update data files by clicking ‘Upload File’ button.
Notice:For more detail about submission status and the available operations, please go to ‘Status and Operation’.
After the GSA data are archived (Status is Checked OK; confidential), you can click the Submission ID to enter the Overview page. On this page, you can 1) update the Release date and Title; 2) edit the Submitter information; 3) Append data by clicking the ‘Add Data’ button, for more information, please see ‘Create new GSA Submission’; and 4) upload or update data file by clicking the ‘Upload File’ button. If you still want to change the Metadata information for each submitted Experiment or Run, please contact us at gsa@big.ac.cn.
Notice:For more detail about submission status and the available operations, please go to ‘Status and Operation’.
After the article is published, you can click on the ‘Release Now’ button in the ‘Operation’ column of the list as shown below.
Click ‘Yes’ in the ‘confirmation box’ to trigger the release. The release of GSA will trigger the release of the related BioProject and BioSample(s), so you DO NOT need to release BioProject and BioSample in their respective system again.
It will take several hours to release a GSA dataset, depending on its data size. After they are released, all the data of the GSA dataset can be retrieved from the BIG Search portal within 14 hours.
Three methods are offered for data uploading: Aspera Command Line, FTP and Aspera Connect plugin. Please choose one to upload your data. If you need any help during data file uploading, please contact the GSA Working Team at gsa@big.ac.cn or QQ group: 548170081.
If the files you are going to upload exceed 30 TB in size, please contact us at gsa@big.ac.cn.
NOTICE:
1. Unique file names should be used for all files, and each file must be listed in the GSA metadata file you uploaded.
2. Files must be compressed using gzip or bzip2.
3. Uploaded files will be removed after they are archived.
Use Aspera Command Line to upload files. You may use the following command to upload files via Aspera Comand Line:
[path/to/ascp/] -P33001 -i [path/to/key/file] -QT -l100m -k1 -d [path/to/folder/containing/files] aspsub@submit.big.ac.cn:uploads/ [user dir]
Where:
[path/to/ascp/]:
Microsoft Windows: C:\Program Files\Aspera\Aspera Connect\bin\ascp.exe
or C:\users\[username]\AppData\Local\Programs\Aspera\Aspera Connect\bin\ascp.exe
Mac OS X: /Applications/Aspera/Connect.app/Contents/Resources/ascp (for admins installation)
or /Users/[username]/Applications/Aspera/Connect.app/Contents/Resources/ascp (for non-admins installation)
Linux: /opt/aspera/bin/ascp or /home/[username]/aspera/connect/bin/ascp
[path/to/key/file] must be an absolute path, e.g.: /home/keys/aspera.openssh
[path/to/folder/containing/files] needs to specify the local folder that contains all the files to upload.
[user dir] user directory. You can click the Submission ID to enter the Overview page. On this page, click the ‘Add Data’ button and enter the “04 Files” page to find the user directory information.
Notice:
1) Please make a new subdirectory for each new submission. Your submission subfolder is a temporary holding area and will be removed once the whole submission is complete.
2) Do not upload complex directory structures or files that do not contain sequence data.
3) Updating Files: After the metadata information has been submitted, you cannot directly access the file upload page through the navigation bar. If you need to re-upload or append data, click on the Submission ID to enter the Overview page. Then, click the Update file button to proceed to the 04 Files file upload page. Here, you can choose the appropriate file upload method and re-upload the files.
● FTP client uploads data
Users need to use an FTP client software, such as FileZilla Client, to log in to the FTP server and upload data. The document uses FileZilla as an example.
1) Step 1: Download the client software from the website (https://filezilla-project.org/). The download page is shown in Figure 1. Click on the Download FileZilla Client’ button in the red box and follow the instructions to install the software.
Figure 1 FileZilla Client Software download
2) Step 2: Open the software, and the interface will appear as shown in Figure 2. Enter the host information as ‘submit.big.ac.cn’, and fill in your GSA database login account email and password as the username and password. Then click ‘Quick connect’. The status bar will display a successful login message. If an error message appears, please check the error reason as indicated.
3) Step 3: After successful login, choose the local data path where the data needs to be uploaded under ‘Local site’. In the ‘Remote site’, double-click on the GSA folder to enter the GSA directory.
4) Step 4: In ‘Local site’, select the data files or folders that need to be uploaded, right-click, and choose ‘Upload’, or directly drag them to ‘Remote site’, as shown in Figure 3.
5) Step 5: All uploaded data will be listed in the ‘Queue’ for uploading. After successful upload, the data information will be moved to the ‘Successful transfers’. If the upload is unsuccessful, it will be moved to ‘Failed transfers’ and will need to be re-uploaded. You can use ‘Resume’ for resuming the upload.
Figure 2 FileZilla Client Interface
Figure 3 FileZilla Client Upload Interface
Figure 4 Data Transfer Status
● FTP Upload data from the command line
ftp command: The commands that need to be entered are underlined
Upload successful interactive page:
● Possible problems
Question 1: When logging in via FTP, an error message of AUTH SSL appears in the status bar as shown in Figure 5.
Solution: Click ‘Site Manager’ in ‘File’ in the menu bar as shown in Figure 6, change the ‘Encryption’ option to ‘Use only ordinary FTP’ or ‘’, and fill in the correct host address: submit.big.a.cn, account number and password information. Finally click ‘Connect’.
Figure 5 Error Filezilla Message
Figure 6 Site Management Settings
Question 2: When logging in via FTP, as shown in Figure 7, an MLSD error appears in the status bar (as shown in Figure 7), showing ‘failed to read directory list’.
Solution: Modify the transmission mode in Filezila -> Edit -> Settings and change it to passive mode (as shown in Figure 8).
Figure 7 Error Filezilla Message
Figure 8 Transfer Mode Modification
GSA has fully considered the needs of users who submit large volumes of data, and has opened a green channel for hard disk delivery and assisted uploading for a one-time upload of data larger than 1TB. Please contact the GSA working group email at gsa@big.ac.cn, fill in ‘PRJCA [please write the number]-hard disk filling information document’, send the electronic version to the working group mailbox, and send the printed paper version to the hard disk with the data GSA.
Release rules of linked BioProject, BioSample, and GSA are as follows:
1.The release of the BioProject records DO NOT trigger the release of the other linked data.
2.The release of the BioSample records JUST triggers the release of its BioProject.
3.The release of the GSA nucleotide sequence data DO trigger the release of the linked BioProject and BioSample records.
Notice: Therefore, please carefully fill in the ‘release time’ of a BioProject, BioSample and GSA. Once published, the representative data or information can be retrieved or downloaded by other users.
GSA Status and Operation
No. | Status | Description | operation |
---|---|---|---|
1 | Unfinished at the General Info step | Finished the Submitter step and enter the general info step. | Edit[1] ; Delete |
2 | Unfinished at the Sample Type step | Finished the General info step. If not created GSA related Biological Sample(s), enter the Sample type step. | Edit[1] ; Delete |
3 | Unfinished at the Attributes step | Finished Sample type step, enter the Attributes step | Edit[1] ; Delete |
4 | Unfinished at the Metadata step | Finished the Attributes step, enter the GSA metadata step. | Edit[1] ; Delete |
5 | Unfinished at the File Upload step | Finished the GSA metadata step, enter the File Upload step. | Edit[1] ; Delete |
6 | Unfinished at the Overview step | Enter the overview step. | Edit[1] ; Delete |
7 | Unchecked | All the information are submitted, waiting for check. | Edit[1] ; Delete |
8 | Checking | Data file(s) processing | Edit[1] ; Delete |
9 | Checked failed | Data file(s) processed error. | Edit[1] ; Delete; Reload data file via FTP or Aspera Command Line[2] |
10 | Checked OK | Data file(s) Processed succeed and GSA Accession number is assigned. | Release Now; Share |
11 | Deleted | Deleted |
[1]: You can click the GSA Submission ID to enter the Overview page to edit GSA related metadata. For more detail, please see ‘How to Edit, Delete or Add New Data’.
[2]: For more details for data file upload, please see ‘Data File Upload’.
Experiment Status and Operation
No. | Status | Description | operation |
---|---|---|---|
1 | Unchecked | Metadata submitted and waiting for check. | Edit[1] ; Delete |
2 | Checked OK | Metadata Checked OK | Edit[1] ; Delete; Reload data file via FTP or Aspera Command Line[2] |
3 | Checked failed | Metadata Checked failed | Edit[1] ; Delete |
4 | Deleted | Deleted |
[1]: You can click the GSA Submission ID to enter the Overview page to edit the Experiment metadata. For more details, please see ‘How to Edit, Delete or Add New Data’.
[2]:For more details for data file upload, please see ‘Data File Upload’.
Run Status and Operation
No. | Status | Description | operation |
---|---|---|---|
1 | Unchecked | Metadata submitted and waiting for check. | Edit[1] ; Delete |
2 | Checked OK | Metadata Checked OK | Edit[1] ; Delete |
3 | Checked failed | Metadata Checked failed | Edit[1] ; Delete |
4 | Uploaded Succeed | Data file(s) uploaded succeed, waiting for processing. | |
5 | Processing | Data file(s) under processing. | |
6 | Processed succeed | Data file(s) processed succeed | |
7 | Processed error | Data file(s) processed error | Edit[1] ; Delete; Reload data file via FTP or Aspera Command Line[2] |
8 | Deleted | Deleted |
[1]: You can click the GSA Submission ID to enter the Overview page to edit the Run metadata. For more details, please see ‘How to Edit, Delete or Add New Data’.
[2]: For more details for data file upload, please see ‘Data File Upload’.