FVC as an adaptive and accurate method for filtering variants from popular NGS analysis pipelines.
Yongyong Ren, Yan Kong, Xiaocheng Zhou, Georgi Z Genchev, Chao Zhou, Hongyu Zhao, Hui Lu
Author Information
Yongyong Ren: State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China. ORCID
Yan Kong: State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China. ORCID
Xiaocheng Zhou: State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China. ORCID
Georgi Z Genchev: Research Affairs, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand. ORCID
Chao Zhou: State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China. ORCID
Hongyu Zhao: Department of Biostatistics, Yale University, New Haven, CT, USA. hongyu.zhao@yale.edu. ORCID
Hui Lu: State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China. huilu@sjtu.edu.cn. ORCID
The quality control of variants from whole-genome sequencing data is vital in clinical diagnosis and human genetics research. However, current filtering methods (Frequency, Hard-Filter, VQSR, GARFIELD, and VEF) were developed to be utilized on particular variant callers and have certain limitations. Especially, the number of eliminated true variants far exceeds the number of removed false variants using these methods. Here, we present an adaptive method for quality control on genetic variants from different analysis pipelines, and validate it on the variants generated from four popular variant callers (GATK HaplotypeCaller, Mutect2, Varscan2, and DeepVariant). FVC consistently exhibited the best performance. It removed far more false variants than the current state-of-the-art filtering methods and recalled ~51-99% true variants filtered out by the other methods. Once trained, FVC can be conveniently integrated into a user-specific variant calling pipeline.