1. Metadata

  • Metadata provides information on submitter and submitted data. We encourage our users to prepare metadata as detailed as possible. Increased metadata creates much greater visibility of your data and research in our search and analysis platforms. Additionally, such information allows for effective use of the data in future applications and permits efficient archiving of the files.
    The following five modules are required to submit to GVM: Submitter details, Project information, Sample details, Variants Analysis and Files.
  • 1.1 Submitter details
    The file captures the credentials of the submitter including submitter name, Email address, telephone number, submitting centre or institute and address.
  • 1.2 Project info
    This is a general information about the project including project title, project description (<200 words), release date, sample number, submitter center and publications.
  • 1.3 Sample details
    Projects consist of analyses that are run on samples. We accept sample information in the form of BioSample, GSA accession(s). We also accept BioSamples, sampleset accessions. We encourage our users to submit as much sample information as possible.

    Explanation Example1
    Bioproject IDIf pre-registered, input the project ID that have createdPRJCA000272
    BioSample IDIf Pre-registered,input the sample ID that have createdSAMC000330
    TitleThe title of the sample Whole genome sequencing of Mark1
    OrganismOrganism of the sampleHomo sapiens
    Sample NameThe name of the sampleMark1
    Tissue/Cell Type Material Sampling tissue or cell materialWhole blood
    Dev_stageIf the sample was obtained from an organism in a specific developmental stage, it is specified with this qualifierAdult
    TypeResearch type(Case-Control、Resistance research or Normal population)Case-Control
    Disease/TraitPhenotype, disease name or trait description Coronary heart disease
    PhenotypeThe description of the sample's phenotypeCase
    Race/EthnicityThe race or ethnicity of the sample Han
    GenerationThe sample's Generation in Family kinship StudiesII
    Population/Subspecies/cultivar The population、subspecies  or cultivar of the sample Asian
    Geographic Location (country and/or sea)The geographical origin of the sample as defined by the country or sea. Country or sea names should be chosen from the INSDC country list (http://insdc.org/country.html) China
    Geographic Location (region and locality) The geographical origin of the sample as defined by the specific region name followed by the locality nameBei Jing
    Longitude and LatitudeThe Longitude and Latitude of the sample's collected region116°23′30″E; 39°54′50″N
    Collectors Who collects this sampleMary
    Collection Date 41640
  • 1.4 Analysis methods
    For GVM, each analysis is one vcf file or several gvcf files, plus an unlimited number of ancillary files. This sheet allows GVM to link these files to a project and to other GVM analyses. Additionally, this worksheet contains variants analysis information detailing the methodology of each analysis.

    Example Example
    Biosample ID SAMC000330SAMC000331
    Technology Whole genome sequencingArray
    Platform/ArrayName Hiseq2000 HumanOmniZhongHua-8 BeadChip v1.0 (Illumina)
    Coverage 10X 
    TaxID HumanMouse
    Reference Version hg19GRCm38.p3
    Software BWA v0.5.9; GATK-2.4-7-g5e89f01  
    Parameters BWA mem; GATK HaplotypeCaller  
  • 1.5 File names
    Filenames and associated checking data associated with this GVM submission should be entered into this worksheet. Each file should be linked to one or more analysis.

    Bioproject BioSample ID File Name File Type MD5
    Example 1 PRJCA000272human.vcf.gz vcf5a942488e5690788129981ec2fee51ad
    Example 2 PRJCA000272SAMC000330 sample1.g.vcf.gz gvcf 6t6ga443sf90788129981ec2fee51ad