1. Metadata

Metadata provides information on submitter and submitted data. We encourage our users to prepare metadata as detailed as possible. Increased metadata creates much greater visibility of your data and research in our search and analysis platforms. Additionally, such information allows for effective use of the data in future applications and permits efficient archiving of the files.
The following five modules are required to submit to GVM: Submitter details, Project information, Sample details, Variants Analysis and Files.

1.1 Submitter details

The file captures the credentials of the submitter including submitter name, Email address, telephone number, submitting centre or institute and address.

1.2 Project info

This is a general information about the project including project title, project description (<200 words), release date, sample number, submitter center and publications.

1.3 Sample details

Projects consist of analyses that are run on samples. We accept sample information in the form of BioSample, GSA accession(s). We also accept BioSamples, sampleset accessions. We encourage our users to submit as much sample information as possible.

	Explanation	Example1
Bioproject ID	If pre-registered, input the project ID that have created	PRJCA000272
BioSample ID	If Pre-registered，input the sample ID that have created	SAMC000330
Title	The title of the sample	Whole genome sequencing of Mark1
Organism	Organism of the sample	Homo sapiens
Sample Name	The name of the sample	Mark1
Tissue/Cell Type Material	Sampling tissue or cell material	Whole blood
Dev_stage	If the sample was obtained from an organism in a specific developmental stage, it is specified with this qualifier	Adult
Type	Research type(Case-Control、Resistance research or Normal population)	Case-Control
Disease/Trait	Phenotype, disease name or trait description	Coronary heart disease
Phenotype	The description of the sample's phenotype	Case
Race/Ethnicity	The race or ethnicity of the sample	Han
Sex		Female
Age		28
Generation	The sample's Generation in Family kinship Studies	II
Population/Subspecies/cultivar	The population、subspecies or cultivar of the sample	Asian
Geographic Location (country and/or sea)	The geographical origin of the sample as defined by the country or sea. Country or sea names should be chosen from the INSDC country list (http://insdc.org/country.html)	China
Geographic Location (region and locality)	The geographical origin of the sample as defined by the specific region name followed by the locality name	Bei Jing
Longitude and Latitude	The Longitude and Latitude of the sample's collected region	116°23′30″E; 39°54′50″N
Collectors	Who collects this sample	Mary
Collection Date		41640

1.4 Analysis methods

For GVM, each analysis is one vcf file or several gvcf files, plus an unlimited number of ancillary files. This sheet allows GVM to link these files to a project and to other GVM analyses. Additionally, this worksheet contains variants analysis information detailing the methodology of each analysis.

	Example	Example
Biosample ID	SAMC000330	SAMC000331
Technology	Whole genome sequencing	Array
Platform/ArrayName	Hiseq2000	HumanOmniZhongHua-8 BeadChip v1.0 (Illumina)
Coverage	10X
TaxID	Human	Mouse
Reference Version	hg19	GRCm38.p3
Software	BWA v0.5.9; GATK-2.4-7-g5e89f01
Parameters	BWA mem; GATK HaplotypeCaller

1.5 File names

Filenames and associated checking data associated with this GVM submission should be entered into this worksheet. Each file should be linked to one or more analysis.

	Bioproject	BioSample ID	File Name	File Type	MD5
Example 1	PRJCA000272		human.vcf.gz	vcf	5a942488e5690788129981ec2fee51ad
Example 2	PRJCA000272	SAMC000330	sample1.g.vcf.gz	gvcf	6t6ga443sf90788129981ec2fee51ad