SNP Calling & ChIP-Seq analysis pipeline genome analysis pipeline

Manual

1. SNP calling pipeline with NGS reads

# Software version
* bwa 0.7.17-r1188
* biobambam 2.0.87
* samtools 1.9
* varscan v2.4.4
* bcftools 1.9
* tabix 1.9

---

* align illumina reads to reference genome
```shell script
bwa mem -M -R '@RG\\tID:sample_id\\tSM:mysample\\tLB:sample_lib\\tPL:ILM' \
-t 24 ref.fa reads.R1.fq.gz reads.R2.fq.gz | \
samtools view -Shb -@ 24 | \
samtools sort -@ 24 -m 2G -o mysample.bam -O BAM
```

* marked duplicates with biobambam2

```shell script
bammarkduplicates2 I=mysample.bam O=mysample.markedd_dup.bam M=mysample.markedd_up.metrics \
markthreads=2 tmpfile=/tmp
```

* samtools mpileup

```shell script
samtools mpileup -B -f ref.fa -o mysample.markedd_dup.mpileup mysample.markedd_dup.bam
```

* call snp with varscan
```shell script
varscan mpileup2snp mysample.markedd_dup.mpileup --p-value 0.05 --mysample-vcf 1 | \
bcftools view -Oz -o mysample.markedd_dup.varscan.vcf.gz
```

* build index for variants

```shell script
tabix mysample.markedd_dup.varscan.vcf.gz
```

2. ChIP-seq analysis pipeline

* BWA align

genome_ref=Arabidopsis_genome_v5.fasta
threads=96
bwa index $genome_ref
samtools=/home/wangbo/anaconda3/bin/samtools

bwa mem -M -t $threads -R @RG"\\t"ID:AtCENH3_control"\\t"PL:ILM"\\t"SM:AtCENH3_control"\\t"LB:HiSeq2500 $genome_ref SRR4430541_1.fastq SRR4430541_2.fastq | $samtools view -@ $threads -Sb - | $samtools sort -@ $threads -o control_sorted.bam

bwa mem -M -t $threads -R @RG"\\t"ID:AtCENH3_Rep1"\\t"PL:ILM"\\t"SM:AtCENH3_Rep1"\\t"LB:HiSeq2500 $genome_ref SRR4430537_1.fastq SRR4430537_2.fastq | $samtools view -@ $threads -Sb - | $samtools sort -@ $threads -o Rep1_sorted.bam

bwa mem -M -t $threads -R @RG"\\t"ID:AtCENH3_Rep2"\\t"PL:ILM"\\t"SM:AtCENH3_Rep2"\\t"LB:HiSeq2500 $genome_ref SRR4430539_1.fastq SRR4430539_2.fastq | $samtools view -@ $threads -Sb - | $samtools sort -@ $threads -o Rep2_sorted.bam

$samtools merge -@ $threads Rep1_Rep2_merge.bam Rep1_sorted.bam Rep2_sorted.bam

* peak calling

macs2 callpeak -t Rep1_Rep2_merge.bam -c control_sorted.bam -f BAM --outdir ATChipseq -n ATChipseq -B --nomodel --extsize 165 --keep-dup all 2> ATChipseq.stdout

macs2 bdgcmp -t ATChipseq/ATChipseq_treat_pileup.bdg -c ATChipseq/ATChipseq_control_lambda.bdg -m FE -o ATChipseq_FE.bdg

sort -k1,1 -k2,2n ATChipseq_FE.bdg > ATChipseq_FE_sorted.bdg

samtools faidx $genome_ref
cut -f1,2 Arabidopsis_genome_v5.fasta.fai > genome.size

bedGraphToBigWig ATChipseq_FE_sorted.bdg genome.size ATChipseq_FE.bw