Variome Data Standards(V1.0 beta)
- 3. Data analysis standards
- 4. Nomenclature standards
1. Metadata
The following five modules are required to submit to GVM: Submitter details, Project information, Sample details, Variants Analysis and Files.
1.1 Submitter details
The file captures the credentials of the submitter including submitter name, Email address, telephone number, submitting centre or institute and address.
1.2 Project info
This is a general information about the project including project title, project description (<200 words), release date, sample number, submitter center and publications.
1.3 Sample details
Projects consist of analyses that are run on samples. We accept sample information in the form of BioSample, GSA accession(s). We also accept BioSamples, sampleset accessions. We encourage our users to submit as much sample information as possible.Explanation | Example1 | |
---|---|---|
Bioproject ID | If pre-registered, input the project ID that have created | PRJCA000272 |
BioSample ID | If Pre-registered,input the sample ID that have created | SAMC000330 |
Title | The title of the sample | Whole genome sequencing of Mark1 |
Organism | Organism of the sample | Homo sapiens |
Sample Name | The name of the sample | Mark1 |
Tissue/Cell Type Material | Sampling tissue or cell material | Whole blood |
Dev_stage | If the sample was obtained from an organism in a specific developmental stage, it is specified with this qualifier | Adult |
Type | Research type(Case-Control、Resistance research or Normal population) | Case-Control |
Disease/Trait | Phenotype, disease name or trait description | Coronary heart disease |
Phenotype | The description of the sample's phenotype | Case |
Race/Ethnicity | The race or ethnicity of the sample | Han |
Sex | Female | |
Age | 28 | |
Generation | The sample's Generation in Family kinship Studies | II |
Population/Subspecies/cultivar | The population、subspecies or cultivar of the sample | Asian |
Geographic Location (country and/or sea) | The geographical origin of the sample as defined by the country or sea. Country or sea names should be chosen from the INSDC country list (http://insdc.org/country.html) | China |
Geographic Location (region and locality) | The geographical origin of the sample as defined by the specific region name followed by the locality name | Bei Jing |
Longitude and Latitude | The Longitude and Latitude of the sample's collected region | 116°23′30″E; 39°54′50″N |
Collectors | Who collects this sample | Mary |
Collection Date | 41640 |
1.4 Analysis methods
For GVM, each analysis is one vcf file or several gvcf files, plus an unlimited number of ancillary files. This sheet allows GVM to link these files to a project and to other GVM analyses. Additionally, this worksheet contains variants analysis information detailing the methodology of each analysis.Example | Example | |
---|---|---|
Biosample ID | SAMC000330 | SAMC000331 |
Technology | Whole genome sequencing | Array |
Platform/ArrayName | Hiseq2000 | HumanOmniZhongHua-8 BeadChip v1.0 (Illumina) |
Coverage | 10X | |
TaxID | Human | Mouse |
Reference Version | hg19 | GRCm38.p3 |
Software | BWA v0.5.9; GATK-2.4-7-g5e89f01 | |
Parameters | BWA mem; GATK HaplotypeCaller |
1.5 File names
Filenames and associated checking data associated with this GVM submission should be entered into this worksheet. Each file should be linked to one or more analysis.Bioproject | BioSample ID | File Name | File Type | MD5 | |
---|---|---|---|---|---|
Example 1 | PRJCA000272 | human.vcf.gz | vcf | 5a942488e5690788129981ec2fee51ad | |
Example 2 | PRJCA000272 | SAMC000330 | sample1.g.vcf.gz | gvcf | 6t6ga443sf90788129981ec2fee51ad |