CandiHap A haplotype analysis toolkit for natural variation study

For Linux system (command lines)

Getting started

There are mainly three steps included in the CandiHap analytical through command lines, and the test data files can freely download at test_data.zip.
Put vcf2hmp.pl test.gff, test.vcf, and genome.fa files in a same dir, then run:

     # 1. To annotate the vcf by ANNOVAR: 
     gffread  test.gff   -T -o test.gtf
     gtfToGenePred -genePredExt test.gtf  si_refGene.txt
     retrieve_seq_from_fasta.pl --format refGene --seqfile  genome.fa  si_refGene.txt --outfile si_refGeneMrna.fa
     table_annovar.pl  test.vcf  ./  --vcfinput --outfile  test  --buildver  si  --protocol refGene --operation g -remove

     # 2. To convert the txt result of annovar to hapmap format:
     perl  vcf2hmp.pl  test.vcf  test.si_multianno.txt

Put CandiHap.pl and Phenotype.txt, Your.hmp, genome.gff files in a same dir, then run:

     # 3. To run CandiHaplotypes
     perl  CandiHap.pl  -m Your.hmp  -f Genome.gff  -p Phenotype.txt  -g Your_gene_ID
e.g. perl  CandiHap.pl  -m haplotypes.hmp  -f test.gff  -p Phenotype.txt  -g Si9g49990

If you want do analysis All gene in LD region of a position, please run:

     perl  GWAS_LD2haplotypes.pl  -f genome.gff  -m ann.hmp  -p Phenotype.txt   -l LDkb  -c Chr:position
e.g. perl  GWAS_LD2haplotypes.pl  -f test.gff  -m haplotypes.hmp  -p Phenotype.txt  -l 50kb  -c 9:54583294

Files Download Count: 2416