Variome Data Standards(V1.0 beta)
- 3. Data analysis standards
- 4. Nomenclature standards
4.4 Variation annotation
Software:VEP
Software was downloaded from https://codeload.github.com/Ensembl/ensembl-tools/zip/release/84, and locally installed.
Input file: the GATK vcf. format file was taken as input
Caches and databases:
1. The pre-built caches can be downloaded from ftp://ftp.ensembl.org/pub/release-84/variation/VEP/, and stored in the directory ./vep .
2. For species that don't have a publicly available cache, it is possible to build a VEP cache using the gtf2vep.pl script. This requires a GTF or GFF file and a FASTA reference sequence.
Command:
perl gtf2vep.pl -i my_species_genes.gtf -f my_species_seq.fa -d 84 -s my_species
VEP parameters and command:
perl variant_effect_predictor.pl -offline -i my_species_variants.vcf -s my_species
Input file: the GATK vcf. format file was taken as input
Caches and databases:
1. The pre-built caches can be downloaded from ftp://ftp.ensembl.org/pub/release-84/variation/VEP/, and stored in the directory ./vep .
2. For species that don't have a publicly available cache, it is possible to build a VEP cache using the gtf2vep.pl script. This requires a GTF or GFF file and a FASTA reference sequence.
Command:
perl gtf2vep.pl -i my_species_genes.gtf -f my_species_seq.fa -d 84 -s my_species
VEP parameters and command:
perl variant_effect_predictor.pl -offline -i my_species_variants.vcf -s my_species
Consequence Type and Effects
| Consequence Type | Effect | SO accession | SO description |
|---|---|---|---|
| transcript_ablation | HIGH | SO:0001893 | A feature ablation whereby the deleted region includes a transcript feature |
| splice_acceptor_variant | HIGH | SO:0001574 | A splice variant that changes the 2 base region at the 3' end of an intron |
| splice_donor_variant | HIGH | SO:0001575 | A splice variant that changes the 2 base region at the 5' end of an intron |
| stop_gained | HIGH | SO:0001587 | A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript |
| frameshift_variant | HIGH | SO:0001589 | A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three |
| stop_lost | HIGH | SO:0001578 | A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript |
| start_lost | HIGH | SO:0002012 | A codon variant that changes at least one base of the canonical start codo |
| transcript_amplification | HIGH | SO:0001889 | A feature amplification of a region containing a transcript |
| splice_region_variant | LOW | SO:0001630 | A sequence variant in which a change has occurred within the region of the splice site, either within 1-3 bases of the exon or 3-8 bases of the intron |
| incomplete_terminal_codon_variant | LOW | SO:0001626 | A sequence variant where at least one base of the final codon of an incompletely annotated transcript is changed |
| stop_retained_variant | LOW | SO:0001567 | A sequence variant where at least one base in the terminator codon is changed, but the terminator remains |
| synonymous_variant | LOW | SO:0001626 | A sequence variant where there is no resulting change to the encoded amino acid |
| inframe_insertion | MODERATE | SO:0001821 | An inframe non synonymous variant that inserts bases into in the coding sequenc |
| inframe_insertion | MODERATE | SO:0001822 | An inframe non synonymous variant that deletes bases from the coding sequenc |
| missense_variant | MODERATE | SO:0001583 | A sequence variant, that changes one or more bases, resulting in a different amino acid sequence but where the length is preserved |
| protein_altering_variant | MODERATE | SO:0001818 | A sequence_variant which is predicted to change the protein encoded in the coding sequence |
| regulatory_region_ablation | MODERATE | SO:0001894 | A feature ablation whereby the deleted region includes a regulatory region |
| coding_sequence_variant | MODIFIER | SO:0001580 | A sequence variant that changes the coding sequence |
| mature_miRNA_variant | MODIFIER | SO:0001620 | A transcript variant located with the sequence of the mature miRNA |
| 5_prime_UTR_variant | MODIFIER | SO:0001623 | A UTR variant of the 5' UTRA |
| 3_prime_UTR_variant | MODIFIER | SO:0001624 | A UTR variant of the 3' UTR |
| non_coding_transcript_exon_variant | MODIFIER | SO:0001792 | A sequence variant that changes non-coding exon sequence in a non-coding transcript |
| intron_variant | MODIFIER | SO:0001627 | A transcript variant occurring within an intron |
| NMD_transcript_variant | MODIFIER | SO:0001621 | A variant in a transcript that is the target of NMD |
| non_coding_transcript_variant | MODIFIER | SO:0001619 | A transcript variant of a non coding RNA gene |
| upstream_gene_variant | MODIFIER | SO:0001631 | A sequence variant located 5' of a gene |
| downstream_gene_variant | MODIFIER | SO:0001632 | A sequence variant located 3' of a gene |
| TFBS_ablation | MODIFIER | SO:0001892 | A feature ablation whereby the deleted region includes a transcription factor binding site |
| TFBS_amplification | MODIFIER | SO:0001892 | A feature amplification of a region containing a transcription factor binding site |
| TF_binding_site_variant | MODIFIER | SO:0001782 | A sequence variant located within a transcription factor binding site |
| regulatory_region_amplification | MODIFIER | SO:0001891 | A feature amplification of a region containing a regulatory region |
| feature_elongation | MODIFIER | SO:0001907 | A sequence variant located within a regulatory region |
| regulatory_region_variant | MODIFIER | SO:0001566 | A sequence variant located within a regulatory region |
| feature_truncation | MODIFIER | SO:0001906 | A sequence variant that causes the reduction of a genomic feature, with regard to the reference sequence |
| intergenic_variant | MODIFIER | SO:0001628 | A sequence variant located in the intergenic region, between genes |
