Use a streamlined submission process to submit the SARS-CoV-2 data (complete or partial sequences):
SARS-CoV-2 submissions must meet the following requirements:
- All sequences are derived from Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
- The following information must be provided regarding the virus: unique isolate, complete collection date, host, country of collection.
Submit SARS-CoV-2 assembled sequence data on the web where it will be automatically assessed for quality and annotated for you with the viral annotation tool VADR.(https://github.com/nawrockie/vadr/wiki/Coronavirusannotation)
Submission Preparation
Prepare the following information for your SARS-CoV-2 submission:
- General: contact details, authors, publication, data release date.
- Sequencing technology information:Sequencing technology, assembly information.
- FASTA-formatted sequences
- Prepare your sequence(s) in the FASTA format that starts with a definition line, followed with a hard return and the sequence.
- The simplest definition line requires the ">" symbol and a sequence_ID.
-
Sequence_ID Naming requirements:
- Beginning with a letter, it is recommended to use the abbreviation of the organization (e.g. QHCDC) to avoid repetition.
- Sequence_IDs may contain only the following characters - letters, digits, hyphens "-", underscores "_".
- Sequence_IDs should be less than 23 characters.
- Sequence length must be 50 - 30,000 bases, and containing <50% unknown base (Ns). The Ns in both end of sequence will be trimmed automatically, and the inside of sequence cannot contain horizontal lines "-" (some alignment software may inserted).
- ??sars.submit.prepare.info3.text5_en??
- Upload a FASTA file as a plain-text file (prepared with a text editor, making sure that the newline format is Unix LF, not Windows CRLF). Text editor is recommended to use Notepad3 ( https://www.rizonesoft.com/downloads/notepad3/ ), and then double-click text editor at the bottom right for LF. The conversion can be completed by the dos2unix command in Linux.
-
Example:
> QHCDC_HB2Y01 CCTTTAT... > BJCDC-0242 GGTAGGT... - We provide a standalone GenBaseTools (gbt) program for users with the need to submit large batches of sequences. Users can download this program to their local machine to run sequence validation and make modifications as prompted until it passes, eliminating the need to upload large volumes of sequences multiple times to the GenBase website. This tool currently supports validation of general sequences and COVID-19 sequences and is compatible with all common Linux distributions. See Help.
-
Source metadata:
Please fill out the template file
GenBase_Modifiers_SARS-CoV-2.xlsx
- Required field: Sequence_ID, Collection_date, Country/Region, Host, Isolate.
- Optional field: Isolation_source, Host_sex, Host_age, etc.
- Please note that isolates should be written in the format specified by the ICTV (International Committee on Classification of Viruses), namely "SARS-CoV-2/host/three letter country abbreviation/unique sample ID/year" format. For example, "SARS-CoV-2/human/CHN/BJCDC-0242/2023" or "SARS-CoV-2/human/USA/SD-SDPHL-2510/2022". Three letters of country abbreviations can be found in https://unstats.un.org/unsd/methodology/m49/.
Important statements:
- Sequence with errors will be removed from the current submission. You will receive a detailed error report on any sequences with errors, and can resubmit after further inspection. In order to successfully complete the data submission in one time, it is recommended to run VADR to review your data. (https://github.com/ncbi/vadr/wiki/Coronavirus-annotation)
- If you believe errors are due to naturally occurring mutations in the virus, please send an email describing the evidence for the mutation to: genbase@big.ac.cn.