Introduction

Next-generation sequencing (NGS) has enabled whole genome and transcriptome single nucleotide variant (SNV) discovery in cancer. NGS produces millions of short sequence reads that, once aligned to a reference genome sequence, can be interpreted for the presence of SNVs. Although tools exist for SNV discovery from NGS data, none are specifically suited to work with data from tumors, where altered ploidy and tumor cellularity impact the statistical expectations of SNV discovery.We developed three implementations of a probabilistic Binomial mixture model, called SNVMix, designed to infer SNVs from NGS data from tumors to address this problem. The first models allelic counts as observations and infers SNVs and model parameters using an expectation maximization (EM) algorithm and is therefore capable of adjusting to deviation of allelic frequencies inherent in genomically unstable tumor genomes. The second models nucleotide and mapping qualities of the reads by probabilistically weighting the contribution of a read/nucleotide to the inference of a SNV based on the confidence we have in the base call and the read alignment. The third combines filtering out low-quality data in addition to probabilistic weighting of the qualities. We quantitatively evaluated these approaches on 16 ovarian cancer RNASeq datasets with matched genotyping arrays and a human breast cancer genome sequenced to >40x (haploid) coverage with ground truth data and show systematically that the SNVMix models outperform competing approaches.Software and data are available at http://compbio.bccrc.casshah@bccrc.ca SUPPLEMANTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Publications

  1. SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors.
    Cite this
    Goya R, Sun MG, Morin RD, Leung G, Ha G, Wiegand KC, Senz J, Crisan A, Marra MA, Hirst M, Huntsman D, Murphy KP, Aparicio S, Shah SP, 2010-03-01 - Bioinformatics (Oxford, England)

Credits

  1. Rodrigo Goya
    Developer

    Department of Molecular Oncology Breast Cancer Research Program, British Columbia Cancer Research Centre, Canada

  2. Mark G F Sun
    Developer

  3. Ryan D Morin
    Developer

  4. Gillian Leung
    Developer

  5. Gavin Ha
    Developer

  6. Kimberley C Wiegand
    Developer

  7. Janine Senz
    Developer

  8. Anamaria Crisan
    Developer

  9. Marco A Marra
    Developer

  10. Martin Hirst
    Developer

  11. David Huntsman
    Developer

  12. Kevin P Murphy
    Developer

  13. Sam Aparicio
    Developer

  14. Sohrab P Shah
    Investigator

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT000483
Tool TypeApplication
Category
PlatformsLinux/Unix
Technologies
User InterfaceTerminal Command Line
Download Count0
Submitted BySohrab P Shah