Variance component testing for identifying differentially expressed genes in RNA-seq data.

Sheng Yang, Fang Shao, Weiwei Duan, Yang Zhao, Feng Chen
Author Information
  1. Sheng Yang: Department of Biostatistics, School of Public Health, Nanjing Medical University, China.
  2. Fang Shao: Department of Biostatistics, School of Public Health, Nanjing Medical University, China.
  3. Weiwei Duan: Department of Biostatistics, School of Public Health, Nanjing Medical University, China.
  4. Yang Zhao: Department of Biostatistics, School of Public Health, Nanjing Medical University, China.
  5. Feng Chen: Department of Biostatistics, School of Public Health, Nanjing Medical University, China.

Abstract

RNA sequencing (RNA-Seq) enables the measurement and comparison of gene expression with isoform-level quantification. Differences in the effect of each isoform may make traditional methods, which aggregate isoforms, ineffective. Here, we introduce a variance component-based test that can jointly test multiple isoforms of one gene to identify differentially expressed (DE) genes, especially those with isoforms that have differential effects. We model isoform-level expression data from RNA-Seq using a negative binomial distribution and consider the baseline abundance of isoforms and their effects as two random terms. Our approach tests the global null hypothesis of no difference in any of the isoforms. The null distribution of the derived score statistic is investigated using empirical and theoretical methods. The results of simulations suggest that the performance of the proposed set test is superior to that of traditional algorithms and almost reaches optimal power when the variance of covariates is large. This method is also applied to analyze real data. Our algorithm, as a supplement to traditional algorithms, is superior at selecting DE genes with sparse or opposite effects for isoforms.

Keywords

References

  1. Nat Rev Genet. 2009 Jan;10(1):57-63 [PMID: 19015660]
  2. Nat Methods. 2011 Jun;8(6):469-77 [PMID: 21623353]
  3. BMC Bioinformatics. 2013 Jun 28;14:210 [PMID: 23806107]
  4. Biochim Biophys Acta. 2015 Jan;1849(1):32-43 [PMID: 25451482]
  5. J Clin Invest. 2011 Jul;121(7):2750-67 [PMID: 21633166]
  6. Bioinformatics. 2010 Jan 1;26(1):139-40 [PMID: 19910308]
  7. Stat Methods Med Res. 2013 Oct;22(5):519-36 [PMID: 22127579]
  8. Trends Genet. 2014 Aug;30(8):340-7 [PMID: 24951248]
  9. Genome Biol. 2010;11(10):R106 [PMID: 20979621]
  10. Nat Rev Genet. 2011 Sep 16;12(10):715-29 [PMID: 21921927]
  11. Bioinformatics. 2015 Jul 15;31(14):2303-9 [PMID: 25735771]
  12. Stat Appl Genet Mol Biol. 2012 Oct 22;11(5):null [PMID: 23104842]
  13. Nat Biotechnol. 2013 Jan;31(1):46-53 [PMID: 23222703]
  14. Nat Biotechnol. 2010 May;28(5):511-5 [PMID: 20436464]
  15. Genome Biol. 2010;11(12):220 [PMID: 21176179]
  16. Lung Cancer. 2009 Jul;65(1):19-24 [PMID: 19058873]
  17. BMC Genomics. 2014;15 Suppl 8:S2 [PMID: 25435284]
  18. Comput Math Methods Med. 2015;2015:178572 [PMID: 26508990]
  19. Am J Hum Genet. 2011 Jul 15;89(1):82-93 [PMID: 21737059]
  20. Brief Bioinform. 2015 Nov;16(6):1000-7 [PMID: 25832647]
  21. Genome Biol. 2015 Jul 23;16:150 [PMID: 26201343]
  22. Nat Genet. 2008 Dec;40(12):1413-5 [PMID: 18978789]
  23. Am J Hum Genet. 2013 Jun 6;92(6):841-53 [PMID: 23684009]
  24. Genome Med. 2016 May 19;8(1):56 [PMID: 27198579]
  25. Cell. 2013 Sep 26;155(1):27-38 [PMID: 24074859]
  26. Nature. 2012 Sep 27;489(7417):519-25 [PMID: 22960745]

Word Cloud

Created with Highcharts 10.0.0isoformstesttraditionalexpressedDEgeneseffectsdataRNA-Seqgeneexpressionisoform-levelmethodsvariancedifferentiallymodelusingdistributionnullsuperioralgorithmsVariancecomponentRNA-seqRNAsequencingenablesmeasurementcomparisonquantificationDifferenceseffectisoformmaymakeaggregateineffectiveintroducecomponent-basedcanjointlymultipleoneidentifyespeciallydifferentialnegativebinomialconsiderbaselineabundancetworandomtermsapproachtestsglobalhypothesisdifferencederivedscorestatisticinvestigatedempiricaltheoreticalresultssimulationssuggestperformanceproposedsetalmostreachesoptimalpowercovariateslargemethodalsoappliedanalyzerealalgorithmsupplementselectingsparseoppositetestingidentifyingDifferentiallyGeneralizedmixedlinearGLMMVCT

Similar Articles

Cited By