VCFtools Version 1
VCFtools is a suite of functions for use on genetic variation data in the form of VCF and BCF files. The tools provided will be used mainly to summarize data, run calculations on data, filter out data, and convert data into other useful file formats.
Input Parameters
How to cite
The Variant Call Format and VCFtools, Petr Danecek, Adam Auton, Goncalo Abecasis, Cornelis A. Albers, Eric Banks, Mark A. DePristo, Robert Handsaker, Gerton Lunter, Gabor Marth, Stephen T. Sherry, Gilean McVean, Richard Durbin and 1000 Genomes Project Analysis Group, Bioinformatics, 2011. [PMID:21653522]
Help information
VCFtools

VCFtools --vcf FILE [ FILTERING OPTIONS ]
FILTERING OPTIONS:
-chrom Include sites with indentifiers matching CHROM. More than one value separated by semicolon can be input to include multiple chromosomes.
-from_bp Specify a lower bound for a range of sites to be processed. Sites with positions less than this value will be excluded. This option can only be used in conjunction with chrom. This option can be used with or without to_bp.
-to_bp Specify a lower bound for a range of sites to be processed. Sites with positions less than this value will be excluded. This option can only be used in conjunction with chrom. This option can be used with or without from_bp.
-not_chromosome Exclude sites with indentifiers matching CHROM. More than one value separated by semicolon can be input to include multiple chromosomes.
-exclude_pos Exclude a set of sites on the basis of a list of positions in a file. Each line of the input file should contain a (tab-separated) chromosome and position. Lines that start with a "#" are command lines and will be ignored.
-pos_file Include a set of sites on the basis of a list of positions in a file. Each line of the input file should contain a (tab-separated) chromosome and position. Lines that start with a "#" are command lines and will be ignored.
-interval Make sure that no two sites are within the specified distance from one another.
-min_quality_score Include only sites with Quality value this threshold.
-site_file Include a list of sites given in a file. The file should contain a list of site IDs, with on ID per line. Lines that start with a "#" are command lines and will be ignored.
-exclude_site_file Include a list of sites given in a file. The file should contain a list of site IDs, with one ID per line. Lines that start with a "#" are command lines and will be ignored.
-site_id Include site(s) with matching ID. More than one value separated by semicolon can be input to include multiple SNPs.
-snp Include sites that contain a SNP.
-indel Include sites that contain an indel.
-maf Include only sites with a Minor Allele Frequency greater than or equal this value. This option can be used with or without Max MAF.
-min_mean_dp Include only sites with mean depth values (over all included individuals) greater than or equal to this value. This option requires that the "DP" FORMAT tag is included for each site. This option can be used with or without Min meanDP.
-max_maf Include only sites with a Minor Allele Frequency less than or equal this value. This option can be used with or without MAF.
-max_mean_dp Include only sites with mean depth values (over all included individuals) greater than or equal to this value. This option requires that the "DP" FORMAT tag is included for each site. This option can be used with or without Max meanDP.
-min_alleles Include only sites with a number of alleles greater than or equal to this value. This option can be used with or without Max alleles.
-max_missing Exclude sites on the basis of the proportion of missing data (defined to be between 0 and 1, where 0 allows sites that are completely missing and 1 indicates no missing data allowed).
-max_alleles Include only sites with a number of alleles greater than or equal to this value. This option can be used with or without Min alleles.
-hwe Assess sites for Hardy-Weinberg Equilibrium using the sites with a p-value below the threshold defined by this option are taken to be out of HWE, and therefore excluded.
-phased Exclude all sites that contain unphased genotypes.
-keep_info Include all sites with a specific INFO flag. This option only filters on the presence of the flag and not its value. More than one value separated by semicolon can be input to specify multiple INFO flags.
-remove_filtered_all Remove all sites with a FILTER flag other than PASS.
-remove_info Exclude all sites with a specific INFO flag. This option only filters on the presence of the flag and not its value. More than one value separated by semicolon can be input to specify multiple INFO flags.
-keep_filter Include all sites marked with a specific FILTER flag. More than one value separated by semicolon can be input to specify multiple FILTER flags.
-remove_filter Exclude all sites marked with a specific FILTER flag. More than one value separated by semicolon can be input to specify multiple FILTER flags.

Parameters Description
-VCF: Vcf file
vcf file for input
-C: Chromosome
include sites with indentifiers matching CHROM. More than one value separated by semicolon can be input to include multiple chromosomes.
-F: From position
specify a lower bound for a range of sites to be processed. Sites with positions less than this value will be excluded. This option can only be used in conjunction with chrom. This option can be used with or without to_bp.
-T: To position
specify a lower bound for a range of sites to be processed. Sites with positions less than this value will be excluded. This option can only be used in conjunction with chrom. This option can be used with or without from_bp.
-Pos: Position file
include a set of sites on the basis of a list of positions in a file. Each line of the input file should contain a (tab-separated) chromosome and position. Lines that start with a "#" are command lines and will be ignored.
-E: Exclude position file
exclude a set of sites on the basis of a list of positions in a file. Each line of the input file should contain a (tab-separated) chromosome and position. Lines that start with a "#" are command lines and will be ignored.
-I: Interval of two SNPs
make sure that no two sites are within the specified distance from one another.
-MQ: Minimum score of quality
include only sites with Quality value this threshold.
-S: Site file
include a list of sites given in a file. The file should contain a list of site IDs, with on ID per line. Lines that start with a "#" are command lines and will be ignored.
-ES: Exclude site file
include a list of sites given in a file. The file should contain a list of site IDs, with one ID per line. Lines that start with a "#" are command lines and will be ignored.
-ID: Site id
include site(s) with matching ID. More than one value separated by semicolon can be input to include multiple SNPs.
-SNP: SNP sites
include sites that contain a SNP.
-INDEL: INDEL sites
include sites that contain an indel.
-MAF: Maf filter
include only sites with a Minor Allele Frequency greater than or equal this value. This option can be used with or without Max MAF.
-MaxMAF: Maximum maf
include only sites with a Minor Allele Frequency less than or equal this value. This option can be used with or without MAF.
-MinA: Minimum alleles
include only sites with a number of alleles greater than or equal to this value. This option can be used with or without Max alleles.
-MaxA: Maximum alleles
include only sites with a number of alleles greater than or equal to this value. This option can be used with or without Min alleles.
-MinDP: Minimum mean depth
include only sites with mean depth values (over all included individuals) greater than or equal to this value. This option requires that the "DP" FORMAT tag is included for each site. This option can be used with or without Min meanDP.
-MaxDP: Maximum mean depth
include only sites with mean depth values (over all included individuals) greater than or equal to this value. This option requires that the "DP" FORMAT tag is included for each site. This option can be used with or without Max meanDP.
-MaxM: Maximum missing
exclude sites on the basis of the proportion of missing data (defined to be between 0 and 1, where 0 allows sites that are completely missing and 1 indicates no missing data allowed).
-HWE: HWE
assess sites for Hardy-Weinberg Equilibrium using the sites with a p-value below the threshold defined by this option are taken to be out of HWE, and therefore excluded.
-Ph: Phased or not
exclude all sites that contain unphased genotypes.
-K: Keep INFO
include all sites with a specific INFO flag. This option only filters on the presence of the flag and not its value. More than one value separated by semicolon can be input to specify multiple INFO flags.
-R: Remove INFO
exclude all sites with a specific INFO flag. This option only filters on the presence of the flag and not its value. More than one value separated by semicolon can be input to specify multiple INFO flags.
-RA: Remove filtered all
remove all sites with a FILTER flag other than PASS.
-KF: Keep filter
include all sites marked with a specific FILTER flag. More than one value separated by semicolon can be input to specify multiple FILTER flags.
-RF: Remove filter
exclude all sites marked with a specific FILTER flag. More than one value separated by semicolon can be input to specify multiple FILTER flags.
  • Strategic Priority Research Program of the Chinese Academy of Sciences,Grant No. XDB13000000
    Maintained by BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences.