CandiHap: a haplotype analysis toolkit for natural variation study.
中文使用说明:https://mp.weixin.qq.com/s/1leAY8a-wYAFISFHgmu93Q
R包版中文使用说明:https://mp.weixin.qq.com/s/-SUnPG8MWW-2nEeVkhTt2Q
CandiHap
is a user-friendly local software, that can fast preselect candidate causal SNPs from Sanger or next-generation sequencing data
, and report results in table and exquisite vector-graphs within a minute. Investigators can use CandiHap to specify a gene or linkage sites based on GWAS and explore favourable haplotypes of candidate genes for target traits. CandiHap can be run on computers with Windows
, Mac OS X
, or Linux
platforms in graphical user interface or command lines, and applied to any species
of plant, animal and microbial. CandiHap is publicly available at https://github.com/xukaili/CandiHap or https://bigd.big.ac.cn/biocode/tools/BT007080 as an open-source software. The analysis of CandiHap can do as the followings:
1). Convert the VCF file to the hapmap format for CandiHap (vcf2hmp
);
2). Haplotype analysis for a gene (CandiHap
);
3). Haplotype analysis for all genes in the LD regions of a significant SNP one by one (GWAS_LD2haplotypes
);
4). Haplotype analysis for Sanger sequencing data of population variation (sanger_CandiHap.sh
).
License
Academic users
may download and use the application free of charge according to the accompanying license.Commercial users
must obtain a commercial license from Xukai Li.
If you have used the program to obtain results, please cite the following paper:
Xukai Li☯* (李旭凯), Zhiyong Shi☯ (石志勇), Qianru Qie (郄倩茹), Jianhua Gao (高建华), Yiwei Jiang (姜亦巍), Yuanhuai Han (韩渊怀) & Xingchun Wang* (王兴春). CandiHap: a haplotype analysis toolkit for natural variation study. bioRxiv 2020.02.27.967539. doi: https://doi.org/10.1101/2020.02.27.967539
(☯ Equal contributors; * Correspondence)
Dependencies
perl 5
, R ≥ 3.2
(with ggplot2, agricolae, pegas and sangerseqR), and electron
.
Figures
Fig. 1 | Overview of the CandiHap process.
a,
A GWAS result. b,
General scheme of the process. c,
The histogram of phenotype. d,
The statistics of haplotypes and significant differences haplotypes are highlighted by color boxes. e,
Gene structure and SNPs of a critical gene. f,
Boxplot of a critical gene’s haplotypes.
Fig. 2 | Haplotype network analysis for Si9g49990. a,
The difference of haplotypes. b,
Haplotype network. Note: only the SNPs and haplotypes found in ≥2 accessions were used to construct the haplotype network. The value of circle size had converted into log2.
Fig. 3 | Haplotype analysis of the ARE1 gene in rice compared with the results by Wang et al. 2018, Nat. Commun. 9, 735.
a,
Gene structure and SNPs of ARE1. b,
Major haplotypes of SNPs in the ARE1 coding region of 2747 rice varieties. c,
The haplotype results of ARE1 coding region of 3023 rice varieties using CandiHap (SNPs data were downloaded from RFGB). Major SNP haplotypes and casual variations in the encoded amino acid residues are shown. The five more SNPs is due to the fact that there are 276 more rice varieties used in our study (highlighted by blue boxes), and two errors highlighted by red boxes.
############################################################################################################################################
############################################################################################################################################
############################################################################################################################################
For Windows
The installation package integrates all the necessary modules for running independently, meaning no more software installation required.
Getting started
To annotate the vcf by ANNOVAR:
gffread test.gff -T -o test.gtf
gtfToGenePred -genePredExt test.gtf si_refGene.txt
retrieve_seq_from_fasta.pl --format refGene --seqfile genome.fa si_refGene.txt --outfile si_refGeneMrna.fa
table_annovar.pl test.vcf ./ --vcfinput --outfile test --buildver si --protocol refGene --operation g -remove
############################################################################################################################################
############################################################################################################################################
############################################################################################################################################
To Install R
for Linux and packages
1. Open an internet browser and go to link: https://www.r-project.org
2. Click the 'download R
' link in the middle of the page under 'Getting Started
'.
3. Select a CRAN location (a mirror site
) and click the corresponding link.
4. Click on the 'Download R for Linux
' link at the top of the page.
5. Click on Download 'R-3.5.0
' (or a newer version).
6. Install R and leave all default settings in the installation options.
7. Open R and install three packages by command:
install.packages(c("ggplot2", "agricolae", "pegas"))
Getting started
There are mainly three steps included in the CandiHap analytical through command lines, and the test data files can freely download at test_data.zip
.
Put vcf2hmp.pl
test.gff, test.vcf, and genome.fa files in a same dir, then run:
# 1. To annotate the vcf by ANNOVAR:
gffread test.gff -T -o test.gtf
gtfToGenePred -genePredExt test.gtf si_refGene.txt
retrieve_seq_from_fasta.pl --format refGene --seqfile genome.fa si_refGene.txt --outfile si_refGeneMrna.fa
table_annovar.pl test.vcf ./ --vcfinput --outfile test --buildver si --protocol refGene --operation g -remove
# 2. To convert the txt result of annovar to hapmap format:
perl vcf2hmp.pl test.vcf test.si_multianno.txt
Put CandiHap.pl
and Phenotype.txt, Your.hmp, genome.gff files in a same dir, then run:
# 3. To run CandiHaplotypes
perl CandiHap.pl ./Your.hmp ./Phenotype.txt ./genome.gff Your_gene_ID
e.g. perl CandiHap.pl ./haplotypes.hmp ./Phenotype.txt ./test.gff Si9g49990
If you want do analysis All gene in LD region of a position
, please run:
perl GWAS_LD2haplotypes.pl ./genome.gff ./ann.hmp ./Phenotype.txt 50kb Chr:position
e.g. perl GWAS_LD2haplotypes.pl ./test.gff ./haplotypes.hmp ./Phenotype.txt 50kb 9:54583294
To Install R
for Mac OS X and packages
1. Open an internet browser and go to link: https://www.r-project.org
2. Click the 'download R
' link in the middle of the page under 'Getting Started
'.
3. Select a CRAN location (a mirror site
) and click the corresponding link.
4. Click on the 'Download R for (Mac) OS X
' link at the top of the page.
5. Click on Download 'R-3.5.0.pkg
' (or a newer version).
6. Install R and leave all default settings in the installation options.
7. Open R and install three packages by command:
install.packages(c("ggplot2", "agricolae", "pegas"))
Getting started
To annotate the vcf by ANNOVAR:
gffread test.gff -T -o test.gtf
gtfToGenePred -genePredExt test.gtf si_refGene.txt
retrieve_seq_from_fasta.pl --format refGene --seqfile genome.fa si_refGene.txt --outfile si_refGeneMrna.fa
table_annovar.pl test.vcf ./ --vcfinput --outfile test --buildver si --protocol refGene --operation g -remove