TSomVar a tumor-only somatic variant detection method

Manual

 

Background

Somatic variants act as key players during cancer occurrence and development, thus an accurate and robust method to identify them is the foundation in the cutting-edge cancer genome research. However, due to low accessibility and high individual-/sample-specificity of the somatic variants in tumor samples, the detection is, to date, still crammed with challenges, particularly when there are no paired normal samples as control. To solve this burning issue, we developed a tumor-only somatic and germline variant identification method (TSomVar), using the random forest algorithm established on sample-specific variant datasets derived from genotype imputation, reads-mapping level annotation and functional annotation.

Installation

Requirements

Application

 Database

 (Note: all database files should be stored at $ {TSomVar_path}/database/)

### database process
### hg19.fa
sed -i 's/^>chr/>/' hg19.fa
samtools faidx hg19.fa ##generate index file .fai
Picard CreateSequenceDictionary REFERENCE=hg19.fa OUTPUT=hg19.fa ##generate index file .dict

Running

./TSomVar \
/path/to/input_bam \
$ {sample_id_in_bam} \
/path/to/TSomVar \
/path/to/table_annovar.pl(annovar) \
/path/to/beagle.18May20.d20.jar \
/path/to/ReadLevel_Features_extraction.py(MosaicForecast) \
/path/to/gatk \
/path/to/hg19.fa \
$ {prefix} \

Output

  • $ {prefix}.result
    • variant and its classification: germline, uncertain, or somatic
  • $ {prefix}.result.prob
    • probability matrix of classification of variant

Maintainers

shishuo@big.ac.cn

Citations

To be continued ...