Bayesian Analysis of RNA-Seq Data Using a Family of Negative Binomial Models.

Lili Zhao, Weisheng Wu, Dai Feng, Hui Jiang, XuanLong Nguyen
Author Information
  1. Lili Zhao: Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, U.S.A.
  2. Weisheng Wu: Department of Computational Medicine & Bioinformatics, University of Michigan.
  3. Dai Feng: Biometrics Research Department, Merck Research Laboratories, U.S.A.
  4. Hui Jiang: Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, U.S.A.
  5. XuanLong Nguyen: Department of Statistics, University of Michigan, Ann Arbor.

Abstract

The analysis of RNA-Seq data has been focused on three main categories, including gene expression, relative exon usage and transcript expression. Methods have been proposed independently for each category using a negative binomial (NB) model. However, counts following a NB distribution on one feature (e.g., exon) do not guarantee a NB distribution for the other two features (e.g., gene/transcript). In this paper we propose a family of Negative Binomial models, which integrates the gene, exon and transcript analysis under a coherent NB model. The proposed model easily incorporates the uncertainty of assigning reads to transcripts and simplifies substantially the estimation for the relative usage. We developed simple Gibbs sampling algorithms for the posterior inference by exploiting fully tractable closed-forms of computation via suitable conjugate priors. The proposed models were investigated under extensive simulations. Finally, we applied our model to a real data set.

Keywords

References

  1. Nature. 2010 Apr 1;464(7289):768-72 [PMID: 20220758]
  2. IEEE Trans Pattern Anal Mach Intell. 2015 Feb;37(2):307-20 [PMID: 26353243]
  3. Genome Res. 2011 Feb;21(2):193-202 [PMID: 20921232]
  4. Genome Biol. 2014;15(12):550 [PMID: 25516281]
  5. Biostatistics. 2013 Jan;14(1):113-28 [PMID: 22988280]
  6. BMC Genomics. 2014 Oct 06;15:862 [PMID: 25283306]
  7. Bioinformatics. 2013 Apr 15;29(8):1035-43 [PMID: 23428641]
  8. PLoS Biol. 2010 Sep 14;8(9): [PMID: 20856902]
  9. Genome Biol. 2010;11(10):R106 [PMID: 20979621]
  10. Bioinformatics. 2010 Jan 1;26(1):139-40 [PMID: 19910308]
  11. Biostatistics. 2013 Apr;14(2):232-43 [PMID: 23001152]
  12. Proc Natl Acad Sci U S A. 2003 Aug 5;100(16):9440-5 [PMID: 12883005]
  13. Nat Biotechnol. 2013 Jan;31(1):46-53 [PMID: 23222703]
  14. Biometrics. 2013 Mar;69(1):174-83 [PMID: 23339534]
  15. Genome Biol. 2011;12(2):R13 [PMID: 21310039]
  16. PLoS One. 2013 Nov 18;8(11):e79448 [PMID: 24260225]
  17. Stat Appl Genet Mol Biol. 2007;6:Article36 [PMID: 18171320]
  18. BMC Bioinformatics. 2013 Mar 09;14:91 [PMID: 23497356]
  19. Nat Methods. 2008 Jul;5(7):621-8 [PMID: 18516045]
  20. BMC Bioinformatics. 2010 Aug 10;11:422 [PMID: 20698981]
  21. BMC Bioinformatics. 2014 Apr 26;15:116 [PMID: 24766777]
  22. Genome Res. 2012 Oct;22(10):2008-17 [PMID: 22722343]
  23. Genome Biol. 2013 Apr 25;14(4):R36 [PMID: 23618408]
  24. Genome Biol. 2013;14(9):R95 [PMID: 24020486]
  25. Stat Sci. 2011 Feb;26(1): [PMID: 24307754]
  26. Brief Bioinform. 2013 Nov;14(6):671-83 [PMID: 22988256]

Grants

  1. P30 CA046592/NCI NIH HHS

Word Cloud

Created with Highcharts 10.0.0NBmodelanalysisRNA-SeqexonusageproposeddistributiondatageneexpressionrelativetranscriptegNegativeBinomialmodelsBayesianfocusedthreemaincategoriesincludingMethodsindependentlycategoryusingnegativebinomialHowevercountsfollowingonefeatureguaranteetwofeaturesgene/transcriptpaperproposefamilyintegratescoherenteasilyincorporatesuncertaintyassigningreadstranscriptssimplifiessubstantiallyestimationdevelopedsimpleGibbssamplingalgorithmsposteriorinferenceexploitingfullytractableclosed-formscomputationviasuitableconjugatepriorsinvestigatedextensivesimulationsFinallyappliedrealsetAnalysisDataUsingFamilyModelsChineserestauranttableDifferentialtestExonTranscript

Similar Articles

Cited By