Shuo Shi: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China.
Qi Wang: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China.
Yunfei Shang: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China.
Congfan Bu: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China.
Mingming Lu: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China.
Meiye Jiang: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China.
Hao Zhang: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China.
Shuhuan Yu: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China.
Jingyao Zeng: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China.
Zaichao Zhang: Department of Biology, The University of Western Ontario, London, Ontario N6A 5B7, Canada.
Zhenglin Du: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China.
Jingfa Xiao: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China. ORCID
Somatic variants act as critical players during cancer occurrence and development. Thus, an accurate and robust method to identify them is the foundation of cutting-edge cancer genome research. However, due to low accessibility and high individual-/sample-specificity of the somatic variants in tumor samples, the detection is, to date, still crammed with challenges, particularly when lacking paired normal samples as control. To solve this burning issue, we developed a tumor-only somatic and germline variant identification method (TSomVar) using the random forest algorithm established on sample-specific variant datasets derived from genotype imputation, reads-mapping level annotation and functional annotation. We trained TSomVar by using genomic variant datasets of three major cancer types: colorectal cancer, hepatocellular carcinoma and skin cutaneous melanoma. Compared with existing tumor-only somatic variant identification tools, TSomVar shows excellent performances in somatic variant detection with higher accuracy and better capability of recalling for test datasets from colorectal cancer and skin cutaneous melanoma. In addition, TSomVar is equipped with the competence of accurately identifying germline variants in tumor samples. Taken together, TSomVar will undoubtedly facilitate and revolutionize somatic variant explorations in cancer research.