CandiHap A haplotype analysis toolkit for natural variation study

Manual

CandiHap: a haplotype analysis toolkit for natural variation study.

中文使用说明https://mp.weixin.qq.com/s/1leAY8a-wYAFISFHgmu93Q 

R包版中文使用说明https://mp.weixin.qq.com/s/-SUnPG8MWW-2nEeVkhTt2Q 

CandiHap is a user-friendly local software, that can fast preselect candidate causal SNPs from Sanger or next-generation sequencing data, and report results in table and exquisite vector-graphs within a minute. Investigators can use CandiHap to specify a gene or linkage sites based on GWAS and explore favourable haplotypes of candidate genes for target traits. CandiHap can be run on computers with Windows, Mac OS X, or Linux platforms in graphical user interface or command lines, and applied to any species of plant, animal and microbial. CandiHap is publicly available at https://github.com/xukaili/CandiHap or https://bigd.big.ac.cn/biocode/tools/BT007080 as an open-source software. The analysis of CandiHap can do as the followings:

    1). Convert the VCF file to the hapmap format for CandiHap (vcf2hmp);
    2). Haplotype analysis for a gene (CandiHap);
    3). Haplotype analysis for all genes in the LD regions of a significant SNP one by one (GWAS_LD2haplotypes);
    4). Haplotype analysis for Sanger sequencing data of population variation (sanger_CandiHap.sh).

License

Academic users may download and use the application free of charge according to the accompanying license.
Commercial users must obtain a commercial license from Xukai Li.
If you have used the program to obtain results, please cite the following paper:

Xukai Li☯* (李旭凯), Zhiyong Shi☯ (石志勇), Qianru Qie (郄倩茹), Jianhua Gao (高建华), Yiwei Jiang (姜亦巍), Yuanhuai Han (韩渊怀) & Xingchun Wang* (王兴春). CandiHap: a haplotype analysis toolkit for natural variation study. bioRxiv 2020.02.27.967539. doi: https://doi.org/10.1101/2020.02.27.967539
(☯ Equal contributors; * Correspondence)


Dependencies

perl 5, R ≥ 3.2 (with ggplot2, agricolae, pegas and sangerseqR), and electron.

Figures

CandiHap Fig. 1 | Overview of the CandiHap process. a, A GWAS result. b, General scheme of the process. c, The histogram of phenotype. d, The statistics of haplotypes and significant differences haplotypes are highlighted by color boxes. e, Gene structure and SNPs of a critical gene. f, Boxplot of a critical gene’s haplotypes.

Fig. 2 | Haplotype network analysis for Si9g49990. a, The difference of haplotypes. b, Haplotype network. Note: only the SNPs and haplotypes found in ≥2 accessions were used to construct the haplotype network. The value of circle size had converted into log2.

Rice-2018_Nat_Commun_9_735 Fig. 3 | Haplotype analysis of the ARE1 gene in rice compared with the results by Wang et al. 2018, Nat. Commun. 9, 735. a, Gene structure and SNPs of ARE1. b, Major haplotypes of SNPs in the ARE1 coding region of 2747 rice varieties. c, The haplotype results of ARE1 coding region of 3023 rice varieties using CandiHap (SNPs data were downloaded from RFGB). Major SNP haplotypes and casual variations in the encoded amino acid residues are shown. The five more SNPs is due to the fact that there are 276 more rice varieties used in our study (highlighted by blue boxes), and two errors highlighted by red boxes.

############################################################################################################################################
############################################################################################################################################
############################################################################################################################################

For Windows

The installation package integrates all the necessary modules for running independently, meaning no more software installation required.


Getting started

To annotate the vcf by ANNOVAR:

     gffread  test.gff   -T -o test.gtf
     gtfToGenePred -genePredExt test.gtf  si_refGene.txt
     retrieve_seq_from_fasta.pl --format refGene --seqfile  genome.fa  si_refGene.txt --outfile si_refGeneMrna.fa
     table_annovar.pl  test.vcf  ./  --vcfinput --outfile  test  --buildver  si  --protocol refGene --operation g -remove

############################################################################################################################################
############################################################################################################################################
############################################################################################################################################

To Install R for Linux and packages

      1. Open an internet browser and go to link: https://www.r-project.org
      2. Click the 'download R' link in the middle of the page under 'Getting Started'.
      3. Select a CRAN location (a mirror site) and click the corresponding link.
      4. Click on the 'Download R for Linux' link at the top of the page.
      5. Click on Download 'R-3.5.0' (or a newer version).
      6. Install R and leave all default settings in the installation options.
      7. Open R and install three packages by command:
          install.packages(c("ggplot2", "agricolae", "pegas"))

Getting started

There are mainly three steps included in the CandiHap analytical through command lines, and the test data files can freely download at test_data.zip.
Put vcf2hmp.pl test.gff, test.vcf, and genome.fa files in a same dir, then run:

     # 1. To annotate the vcf by ANNOVAR: 
     gffread  test.gff   -T -o test.gtf
     gtfToGenePred -genePredExt test.gtf  si_refGene.txt
     retrieve_seq_from_fasta.pl --format refGene --seqfile  genome.fa  si_refGene.txt --outfile si_refGeneMrna.fa
     table_annovar.pl  test.vcf  ./  --vcfinput --outfile  test  --buildver  si  --protocol refGene --operation g -remove

     # 2. To convert the txt result of annovar to hapmap format:
     perl  vcf2hmp.pl  test.vcf  test.si_multianno.txt

Put CandiHap.pl and Phenotype.txt, Your.hmp, genome.gff files in a same dir, then run:

     # 3. To run CandiHaplotypes
     perl  CandiHap.pl  ./Your.hmp  ./Phenotype.txt  ./genome.gff  Your_gene_ID
e.g. perl  CandiHap.pl  ./haplotypes.hmp  ./Phenotype.txt  ./test.gff  Si9g49990

If you want do analysis All gene in LD region of a position, please run:

     perl  GWAS_LD2haplotypes.pl  ./genome.gff  ./ann.hmp  ./Phenotype.txt  50kb  Chr:position
e.g. perl  GWAS_LD2haplotypes.pl  ./test.gff  ./haplotypes.hmp  ./Phenotype.txt  50kb  9:54583294

To Install R for Mac OS X and packages

      1. Open an internet browser and go to link: https://www.r-project.org
      2. Click the 'download R' link in the middle of the page under 'Getting Started'.
      3. Select a CRAN location (a mirror site) and click the corresponding link.
      4. Click on the 'Download R for (Mac) OS X' link at the top of the page.
      5. Click on Download 'R-3.5.0.pkg' (or a newer version).
      6. Install R and leave all default settings in the installation options.
      7. Open R and install three packages by command:
          install.packages(c("ggplot2", "agricolae", "pegas"))

Getting started

To annotate the vcf by ANNOVAR:

     gffread  test.gff   -T -o test.gtf
     gtfToGenePred -genePredExt test.gtf  si_refGene.txt
     retrieve_seq_from_fasta.pl --format refGene --seqfile  genome.fa  si_refGene.txt --outfile si_refGeneMrna.fa
     table_annovar.pl  test.vcf  ./  --vcfinput --outfile  test  --buildver  si  --protocol refGene --operation g -remove
 
############################################################################################################################################
############################################################################################################################################
############################################################################################################################################

To Install CandiHap.app for Mac OS X

If you attempt to open CandiHap.app and macOS stops you from doing so, that doesn't necessarily mean there is something wrong with the app. But it will indicate that the app is from an 'unidentified developer'.
You can open the app and override the block. Here's how:
      1. Open 'System Preferences'.
      2. Go to 'Security & Privacy' and select the 'General' tab.
      3. Click on the button 'Open Anyway'.
      4. You’ll be asked one more time, and clicking 'Open'.


Contact information

In the future, CandiHap will be regularly updated, and extended to fulfill more functions with more user-friendly options.
For any questions please contact xukai_li@sxau.edu.cn or xukai_li@qq.com