Download variation files or useful tools in GVM
Variation Data
All genomic variation data are publicly available. Variation data files in VCF and FASTA formats are tabulated as below.Note:
Brief VCF is the vcf format file without individual genotype;
Detailed VCF is the vcf format file with individual genotype.
Organism (version) | SNP (VCF) | SNP (VCF) | SNP (FASTA) | Short INDEL (VCF) | Short INDEL (VCF) | Short INDEL (FASTA) |
---|---|---|---|---|---|---|
Ailuropoda melanoleuca (AilMel1) | Brief VCF | Detailed VCF | FASTA | Brief VCF | Detailed VCF | FASTA |
Ailuropoda melanoleuca (ASM200744v2) | Brief VCF | Detailed VCF | FASTA | Brief VCF | Detailed VCF | FASTA |
Ailurus fulgens (ASM200746v1) | Brief VCF | Detailed VCF | FASTA | Brief VCF | Detailed VCF | FASTA |
Anas platyrhynchos (BGI_duck_1.0) | Brief VCF | Detailed VCF | FASTA | Brief VCF | Detailed VCF | FASTA |
Anser cygnoides (PRJNA183603_v1.0) | Brief VCF | Detailed VCF | FASTA | Brief VCF | Detailed VCF | FASTA |
Bos mutus (BosGru_v2.0) | Brief VCF | Detailed VCF | FASTA | Brief VCF | Detailed VCF | FASTA |
Bos taurus (UMD_3.1) | Brief VCF | Detailed VCF | FASTA | Brief VCF | Detailed VCF | FASTA |
Brassica napus (Bra_napus_v2.0) | Brief VCF | Detailed VCF | FASTA | Brief VCF | Detailed VCF | FASTA |
Brassica rapa (CAAS_Brap_v3.01) | Brief VCF | Detailed VCF | FASTA | Brief VCF | Detailed VCF | FASTA |
Canis familiaris (CanFam3.1) | Brief VCF | Detailed VCF | FASTA | Brief VCF | Detailed VCF | FASTA |
About the data
VCF (Variant Call Format) is a simplified text file format containing information about a position in the genome. More details about its format and specifications are listed below.
FASTA format provide 50nt flanking sequences for each variants (50nt for each flank) which is typically useful for BLAST applications. e.g.
1. #CHROM is short for chromosome number
2. POS is short for chromosome position
3. ID is variation identifier in GVM system
4. REF is short for reference allele
5. ALT is short for alternate allele
6. QUAL is variants quality
7. FILTER is filter status
8. INFO is additional information for each variant
More detail information of vcf. format can be found in http://samtools.github.io/hts-specs/VCFv4.1.pdf
2. POS is short for chromosome position
3. ID is variation identifier in GVM system
4. REF is short for reference allele
5. ALT is short for alternate allele
6. QUAL is variants quality
7. FILTER is filter status
8. INFO is additional information for each variant
FASTA format provide 50nt flanking sequences for each variants (50nt for each flank) which is typically useful for BLAST applications. e.g.
>OSA01S123 class=1|alleles="A/G"|version=1
AGGTCCAGGCTGCCAAGCTTGAACTCCGTCTCCCAGACGACGACGGCCGC
R
GGAGGAAGGCGGACCATGTCGCCGGTGAGGTTGTTGCAGACAGACACGCA
AGGTCCAGGCTGCCAAGCTTGAACTCCGTCTCCCAGACGACGACGGCCGC
R
GGAGGAAGGCGGACCATGTCGCCGGTGAGGTTGTTGCAGACAGACACGCA
Useful tools
1. Variants calling tools:
3. Variants annotation tools:
Genome Analysis Toolkit (GATK): https://www.broadinstitute.org/gatk/
Picard: http://www.psc.edu/index.php/user-resources/software/picard
samtools and bcftools: http://samtools.sourceforge.net/
2. Genome alignment tools:Picard: http://www.psc.edu/index.php/user-resources/software/picard
samtools and bcftools: http://samtools.sourceforge.net/
3. Variants annotation tools: