NextPolish2 Repeat-aware polishing genomes assembled using HiFi long reads

Manual

NextPolish2 takes a genome assembly file, a HiFi mapping file and one or more k-mer dataset files from short reads as input and generates the polished genome.

  1. Prepare HiFi mapping file (winnowmap or minimap2).
meryl count k=15 output merylDB asm.fa.gz
meryl print greater-than distinct=0.9998 merylDB > repetitive_k15.txt
winnowmap -t 5 -W repetitive_k15.txt -ax map-pb asm.fa.gz hifi.fasta.gz|samtools sort -o hifi.map.sort.bam -

# or mapping using minimap2
# minimap2 -ax map-hifi -t 5 asm.fa.gz hifi.fasta.gz|samtools sort -o hifi.map.sort.bam -

# indexing
samtools index hifi.map.sort.bam
 
  1. Prepare k-mer dataset files (yak). Here we only produce 21-mer and 31-mer datasets, you can produce more k-mer datasets with different k-mer size.
# produce a 21-mer dataset, remove -b 37 if you want to count singletons
./yak/yak count -o k21.yak -k 21 -b 37 <(zcat sr.R*.fastq.gz) <(zcat sr.R*.fastq.gz)

# produce a 31-mer dataset, remove -b 37 if you want to count singletons
./yak/yak count -o k31.yak -k 31 -b 37 <(zcat sr.R*.fastq.gz) <(zcat sr.R*.fastq.gz) 
 
  1. Run NextPolish2.
./target/release/nextPolish2 -t 5 hifi.map.sort.bam asm.fa.gz k21.yak k31.yak > asm.np2.fa
# or try with -r, it usually produces better results for highly heterozygous or homozygous genomes.
./target/release/nextPolish2 -r -t 5 hifi.map.sort.bam asm.fa.gz k21.yak k31.yak > asm.np2.fa