An empirical Bayesian framework for somatic mutation detection from cancer genome sequencing data.

Yuichi Shiraishi, Yusuke Sato, Kenichi Chiba, Yusuke Okuno, Yasunobu Nagata, Kenichi Yoshida, Norio Shiba, Yasuhide Hayashi, Haruki Kume, Yukio Homma, Masashi Sanada, Seishi Ogawa, Satoru Miyano
Author Information
  1. Yuichi Shiraishi: Laboratory of DNA Information Analysis, Human Genome Center, Institute of Medical Science, The University of Tokyo, 4-6-1, Shirokanedai, Minato-ku, Tokyo 108-8639, Japan. yshira@hgc.jp

Abstract

Recent advances in high-throughput sequencing technologies have enabled a comprehensive dissection of the cancer genome clarifying a large number of somatic mutations in a wide variety of cancer types. A number of methods have been proposed for mutation calling based on a large amount of sequencing data, which is accomplished in most cases by statistically evaluating the difference in the observed allele frequencies of possible single nucleotide variants between tumours and paired normal samples. However, an accurate detection of mutations remains a challenge under low sequencing depths or tumour contents. To overcome this problem, we propose a novel method, Empirical Bayesian mutation Calling (https://github.com/friend1ws/EBCall), for detecting somatic mutations. Unlike previous methods, the proposed method discriminates somatic mutations from sequencing errors based on an empirical Bayesian framework, where the model parameters are estimated using sequencing data from multiple non-paired normal samples. Using 13 whole-exome sequencing data with 87.5-206.3 mean sequencing depths, we demonstrate that our method not only outperforms several existing methods in the calling of mutations with moderate allele frequencies but also enables accurate calling of mutations with low allele frequencies (≤ 10%) harboured within a minor tumour subpopulation, thus allowing for the deciphering of fine substructures within a tumour specimen.

References

  1. Bioinformatics. 2012 Feb 1;28(3):311-7 [PMID: 22155872]
  2. Bioinformatics. 2009 Jul 15;25(14):1754-60 [PMID: 19451168]
  3. Nucleic Acids Res. 2008 Sep;36(16):e105 [PMID: 18660515]
  4. Bioinformatics. 2010 Jun 15;26(12):i318-24 [PMID: 20529923]
  5. Nucleic Acids Res. 2010 Sep;38(16):e164 [PMID: 20601685]
  6. Nature. 2011 Sep 11;478(7367):64-9 [PMID: 21909114]
  7. Genome Res. 2002 Apr;12(4):656-64 [PMID: 11932250]
  8. Nature. 2012 Apr 04;486(7403):395-9 [PMID: 22495314]
  9. Nature. 2012 Jan 11;481(7382):506-10 [PMID: 22237025]
  10. Genome Biol. 2012 May 23;13(5):R34 [PMID: 22621726]
  11. Genome Biol. 2009;10(3):R25 [PMID: 19261174]
  12. Bioinformatics. 2012 Apr 1;28(7):907-13 [PMID: 22285562]
  13. Nucleic Acids Res. 2011 Jul;39(13):e90 [PMID: 21576222]
  14. Genome Res. 2012 Mar;22(3):568-76 [PMID: 22300766]
  15. Nat Rev Genet. 2010 Oct;11(10):685-96 [PMID: 20847746]
  16. Nature. 2009 Oct 8;461(7265):809-13 [PMID: 19812674]
  17. Cell. 2012 May 25;149(5):994-1007 [PMID: 22608083]

MeSH Term

Algorithms
Bayes Theorem
DNA Mutational Analysis
Gene Frequency
Genomics
High-Throughput Nucleotide Sequencing
Humans
Neoplasms

Word Cloud

Created with Highcharts 10.0.0sequencingmutationssomaticdatacancermethodsmutationcallingallelefrequenciestumourmethodBayesiangenomelargenumberproposedbasednormalsamplesaccuratedetectionlowdepthsempiricalframeworkwithinRecentadvanceshigh-throughputtechnologiesenabledcomprehensivedissectionclarifyingwidevarietytypesamountaccomplishedcasesstatisticallyevaluatingdifferenceobservedpossiblesinglenucleotidevariantstumourspairedHoweverremainschallengecontentsovercomeproblemproposenovelEmpiricalCallinghttps://githubcom/friend1ws/EBCalldetectingUnlikepreviousdiscriminateserrorsmodelparametersestimatedusingmultiplenon-pairedUsing13whole-exome875-2063meandemonstrateoutperformsseveralexistingmoderatealsoenables10%harbouredminorsubpopulationthusallowingdecipheringfinesubstructuresspecimen

Similar Articles

Cited By