Introduction

Single nucleotide variant (SNV) detection procedures are being utilized as never before to analyze the recent abundance of high-throughput DNA sequencing data, both on single and multiple sample datasets. Building on previously published work with the single sample SNV caller genotype model selection (GeMS), a multiple sample version of GeMS (MultiGeMS) is introduced. Unlike other popular multiple sample SNV callers, the MultiGeMS statistical model accounts for enzymatic substitution sequencing errors. It also addresses the multiple testing problem endemic to multiple sample SNV calling and utilizes high performance computing (HPC) techniques.A simulation study demonstrates that MultiGeMS ranks highest in precision among a selection of popular multiple sample SNV callers, while showing exceptional recall in calling common SNVs. Further, both simulation studies and real data analyses indicate that MultiGeMS is robust to low-quality data. We also demonstrate that accounting for enzymatic substitution sequencing errors not only improves SNV call precision at low mapping quality regions, but also improves recall at reference allele-dominated sites with high mapping quality.The MultiGeMS package can be downloaded from https://github.com/cui-lab/multigemsxinping.cui@ucr.eduSupplementary data are available at Bioinformatics online.

Publications

  1. MultiGeMS: detection of SNVs from multiple samples using model selection on high-throughput sequencing data.
    Cite this
    Murillo GH, You N, Su X, Cui W, Reilly MP, Li M, Ning K, Cui X, 2016-05-01 - Bioinformatics (Oxford, England)

Credits

  1. Gabriel H Murillo
    Developer

    Department of Statistics, University of California, United States of America

  2. Na You
    Developer

    Department of Statistical Science, School of Mathematics and Computational Science, China

  3. Xiaoquan Su
    Developer

    Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, China

  4. Wei Cui
    Developer

    Department of Statistics, University of California, United States of America

  5. Muredach P Reilly
    Developer

  6. Mingyao Li
    Developer

    Department of Biostatistics and Epidemiology, University of Pennsylvania Perelman School of Medicine, United States of America

  7. Kang Ning
    Developer

    Key Laboratory of Molecular Biophysics of the Ministry of Education, College of Life Science and Technology

  8. Xinping Cui
    Investigator

    Department of Statistics, University of California, United States of America

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT003632
Tool TypeApplication
Category
PlatformsLinux/Unix
TechnologiesC++
User InterfaceTerminal Command Line
Download Count0
Country/RegionUnited States of America
Submitted ByXinping Cui