Variome Data Standards(V1.0 beta)
- 3. Data analysis standards
- 4. Nomenclature standards
4.4 Variation annotation
Software:VEP
Software was downloaded from https://codeload.github.com/Ensembl/ensembl-tools/zip/release/84, and locally installed.
Input file: the GATK vcf. format file was taken as input
Caches and databases:
1. The pre-built caches can be downloaded from ftp://ftp.ensembl.org/pub/release-84/variation/VEP/, and stored in the directory ./vep .
2. For species that don't have a publicly available cache, it is possible to build a VEP cache using the gtf2vep.pl script. This requires a GTF or GFF file and a FASTA reference sequence.
Command:
perl gtf2vep.pl -i my_species_genes.gtf -f my_species_seq.fa -d 84 -s my_species
VEP parameters and command:
perl variant_effect_predictor.pl -offline -i my_species_variants.vcf -s my_species
Input file: the GATK vcf. format file was taken as input
Caches and databases:
1. The pre-built caches can be downloaded from ftp://ftp.ensembl.org/pub/release-84/variation/VEP/, and stored in the directory ./vep .
2. For species that don't have a publicly available cache, it is possible to build a VEP cache using the gtf2vep.pl script. This requires a GTF or GFF file and a FASTA reference sequence.
Command:
perl gtf2vep.pl -i my_species_genes.gtf -f my_species_seq.fa -d 84 -s my_species
VEP parameters and command:
perl variant_effect_predictor.pl -offline -i my_species_variants.vcf -s my_species
Consequence Type and Effects
Consequence Type | Effect | SO accession | SO description |
---|---|---|---|
transcript_ablation | HIGH | SO:0001893 | A feature ablation whereby the deleted region includes a transcript feature |
splice_acceptor_variant | HIGH | SO:0001574 | A splice variant that changes the 2 base region at the 3' end of an intron |
splice_donor_variant | HIGH | SO:0001575 | A splice variant that changes the 2 base region at the 5' end of an intron |
stop_gained | HIGH | SO:0001587 | A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript |
frameshift_variant | HIGH | SO:0001589 | A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three |
stop_lost | HIGH | SO:0001578 | A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript |
start_lost | HIGH | SO:0002012 | A codon variant that changes at least one base of the canonical start codo |
transcript_amplification | HIGH | SO:0001889 | A feature amplification of a region containing a transcript |
splice_region_variant | LOW | SO:0001630 | A sequence variant in which a change has occurred within the region of the splice site, either within 1-3 bases of the exon or 3-8 bases of the intron |
incomplete_terminal_codon_variant | LOW | SO:0001626 | A sequence variant where at least one base of the final codon of an incompletely annotated transcript is changed |
stop_retained_variant | LOW | SO:0001567 | A sequence variant where at least one base in the terminator codon is changed, but the terminator remains |
synonymous_variant | LOW | SO:0001626 | A sequence variant where there is no resulting change to the encoded amino acid |
inframe_insertion | MODERATE | SO:0001821 | An inframe non synonymous variant that inserts bases into in the coding sequenc |
inframe_insertion | MODERATE | SO:0001822 | An inframe non synonymous variant that deletes bases from the coding sequenc |
missense_variant | MODERATE | SO:0001583 | A sequence variant, that changes one or more bases, resulting in a different amino acid sequence but where the length is preserved |
protein_altering_variant | MODERATE | SO:0001818 | A sequence_variant which is predicted to change the protein encoded in the coding sequence |
regulatory_region_ablation | MODERATE | SO:0001894 | A feature ablation whereby the deleted region includes a regulatory region |
coding_sequence_variant | MODIFIER | SO:0001580 | A sequence variant that changes the coding sequence |
mature_miRNA_variant | MODIFIER | SO:0001620 | A transcript variant located with the sequence of the mature miRNA |
5_prime_UTR_variant | MODIFIER | SO:0001623 | A UTR variant of the 5' UTRA |
3_prime_UTR_variant | MODIFIER | SO:0001624 | A UTR variant of the 3' UTR |
non_coding_transcript_exon_variant | MODIFIER | SO:0001792 | A sequence variant that changes non-coding exon sequence in a non-coding transcript |
intron_variant | MODIFIER | SO:0001627 | A transcript variant occurring within an intron |
NMD_transcript_variant | MODIFIER | SO:0001621 | A variant in a transcript that is the target of NMD |
non_coding_transcript_variant | MODIFIER | SO:0001619 | A transcript variant of a non coding RNA gene |
upstream_gene_variant | MODIFIER | SO:0001631 | A sequence variant located 5' of a gene |
downstream_gene_variant | MODIFIER | SO:0001632 | A sequence variant located 3' of a gene |
TFBS_ablation | MODIFIER | SO:0001892 | A feature ablation whereby the deleted region includes a transcription factor binding site |
TFBS_amplification | MODIFIER | SO:0001892 | A feature amplification of a region containing a transcription factor binding site |
TF_binding_site_variant | MODIFIER | SO:0001782 | A sequence variant located within a transcription factor binding site |
regulatory_region_amplification | MODIFIER | SO:0001891 | A feature amplification of a region containing a regulatory region |
feature_elongation | MODIFIER | SO:0001907 | A sequence variant located within a regulatory region |
regulatory_region_variant | MODIFIER | SO:0001566 | A sequence variant located within a regulatory region |
feature_truncation | MODIFIER | SO:0001906 | A sequence variant that causes the reduction of a genomic feature, with regard to the reference sequence |
intergenic_variant | MODIFIER | SO:0001628 | A sequence variant located in the intergenic region, between genes |