Differentially expressed heterogeneous overdispersion genes testing for count data.

Yubai Yuan, Qi Xu, Agaz Wani, Jan Dahrendorff, Chengqi Wang, Arlina Shen, Janelle Donglasan, Sarah Burgan, Zachary Graham, Monica Uddin, Derek Wildman, Annie Qu
Author Information
  1. Yubai Yuan: Department of Statistics, The Pennsylvania State University, State College, PA, United States of America.
  2. Qi Xu: Department of Statistics, University of California Irvine, Irvine, CA, United States of America. ORCID
  3. Agaz Wani: Genomics Program, College of Public Health, University of South Florida, Tampa, FL, United States of America.
  4. Jan Dahrendorff: Genomics Program, College of Public Health, University of South Florida, Tampa, FL, United States of America.
  5. Chengqi Wang: Genomics Program, College of Public Health, University of South Florida, Tampa, FL, United States of America.
  6. Arlina Shen: University of California Berkeley, Berkeley, CA, United States of America.
  7. Janelle Donglasan: Genomics Program, College of Public Health, University of South Florida, Tampa, FL, United States of America.
  8. Sarah Burgan: Genomics Program, College of Public Health, University of South Florida, Tampa, FL, United States of America.
  9. Zachary Graham: Genomics Program, College of Public Health, University of South Florida, Tampa, FL, United States of America.
  10. Monica Uddin: Genomics Program, College of Public Health, University of South Florida, Tampa, FL, United States of America.
  11. Derek Wildman: Genomics Program, College of Public Health, University of South Florida, Tampa, FL, United States of America.
  12. Annie Qu: Department of Statistics, University of California Irvine, Irvine, CA, United States of America. ORCID

Abstract

The mRNA-seq data analysis is a powerful technology for inferring information from biological systems of interest. Specifically, the sequenced RNA fragments are aligned with genomic reference sequences, and we count the number of sequence fragments corresponding to each gene for each condition. A gene is identified as differentially expressed (DE) if the difference in its count numbers between conditions is statistically significant. Several statistical analysis methods have been developed to detect DE genes based on RNA-seq data. However, the existing methods could suffer decreasing power to identify DE genes arising from overdispersion and limited sample size, where overdispersion refers to the empirical phenomenon that the variance of read counts is larger than the mean of read counts. We propose a new differential expression analysis procedure: heterogeneous overdispersion genes testing (DEHOGT) based on heterogeneous overdispersion modeling and a post-hoc inference procedure. DEHOGT integrates sample information from all conditions and provides a more flexible and adaptive overdispersion modeling for the RNA-seq read count. DEHOGT adopts a gene-wise estimation scheme to enhance the detection power of differentially expressed genes when the number of replicates is limited as long as the number of conditions is large. DEHOGT is tested on the synthetic RNA-seq read count data and outperforms two popular existing methods, DESeq2 and EdgeR, in detecting DE genes. We apply the proposed method to a test dataset using RNAseq data from microglial cells. DEHOGT tends to detect more differently expressed genes potentially related to microglial cells under different stress hormones treatments.

References

  1. Front Endocrinol (Lausanne). 2015 Nov 09;6:170 [PMID: 26617572]
  2. Bioinformatics. 2012 Aug 15;28(16):2184-5 [PMID: 22743226]
  3. BMC Bioinformatics. 2018 Jun 22;19(1):236 [PMID: 29929481]
  4. PLoS One. 2013 Dec 09;8(12):e81415 [PMID: 24349066]
  5. Neuron. 2007 Oct 4;56(1):19-32 [PMID: 17920012]
  6. Indian J Endocrinol Metab. 2011 Jan;15(1):18-22 [PMID: 21584161]
  7. BMC Bioinformatics. 2022 Nov 16;23(1):488 [PMID: 36384457]
  8. Stat Med. 2017 Mar 15;36(6):1029-1040 [PMID: 27917499]
  9. Nat Neurosci. 2013 Jan;16(1):33-41 [PMID: 23201972]
  10. Bioinformatics. 2010 Jan 1;26(1):139-40 [PMID: 19910308]
  11. J Exp Med. 2019 Feb 4;216(2):384-406 [PMID: 30674564]
  12. Proc Natl Acad Sci U S A. 2005 Mar 8;102(10):3697-702 [PMID: 15738394]
  13. N Engl J Med. 2002 Jan 10;346(2):108-14 [PMID: 11784878]
  14. Dis Markers. 2011;30(2-3):101-10 [PMID: 21508514]
  15. Ecology. 2007 Nov;88(11):2766-72 [PMID: 18051645]
  16. Nat Biotechnol. 2013 Jan;31(1):46-53 [PMID: 23222703]
  17. Trends Genet. 2008 Mar;24(3):133-41 [PMID: 18262675]
  18. PLoS One. 2022 Sep 16;17(9):e0264246 [PMID: 36112652]
  19. Results Immunol. 2015 Oct 21;5:37-42 [PMID: 26697291]
  20. Hum Mutat. 2000;15(1):16-21 [PMID: 10612817]
  21. Nat Biotechnol. 2006 Sep;24(9):1140-50 [PMID: 16964228]
  22. Eur J Psychotraumatol. 2017 Oct 27;8(sup5):1353383 [PMID: 29075426]
  23. PLoS One. 2014 Jun 13;9(6):e99625 [PMID: 24926665]
  24. Arch Gen Psychiatry. 2011 Sep;68(9):901-10 [PMID: 21536970]
  25. Genome Biol. 2010;11(3):R25 [PMID: 20196867]
  26. Bioinformatics. 2007 Nov 1;23(21):2881-7 [PMID: 17881408]
  27. J Consult Clin Psychol. 2000 Oct;68(5):748-66 [PMID: 11068961]
  28. Front Genet. 2020 Jan 17;10:1331 [PMID: 32010190]
  29. Am J Reprod Immunol. 2001 Apr;45(4):205-16 [PMID: 11327547]
  30. Genome Biol. 2010;11(10):R106 [PMID: 20979621]
  31. Aust N Z J Psychiatry. 2011 May;45(5):407-15 [PMID: 21189046]
  32. J Clin Psychiatry. 2009 Dec;70(12):1629-35 [PMID: 19852906]
  33. Biostatistics. 2013 Apr;14(2):232-43 [PMID: 23001152]
  34. Mucosal Immunol. 2014 Mar;7(2):348-58 [PMID: 23945545]
  35. Brain Behav Immun. 2019 Oct;81:280-291 [PMID: 31228611]
  36. BMC Bioinformatics. 2013 Mar 09;14:91 [PMID: 23497356]
  37. Nat Methods. 2008 Jul;5(7):621-8 [PMID: 18516045]
  38. Biol Psychiatry. 2009 Oct 1;66(7):708-11 [PMID: 19393990]
  39. Am J Community Psychol. 2014 Mar;53(1-2):159-72 [PMID: 24469249]
  40. Neuropsychopharmacology. 2011 Sep;36(10):1982-91 [PMID: 21654733]
  41. Int J Mol Sci. 2019 Jan 23;20(3): [PMID: 30678080]
  42. Psychoneuroendocrinology. 2009 Dec;34 Suppl 1:S186-95 [PMID: 19560279]

Grants

  1. R01 MD011728/NIMHD NIH HHS

MeSH Term

Gene Expression Profiling
Animals
Sequence Analysis, RNA
Humans
RNA-Seq
Algorithms
Mice
RNA, Messenger

Chemicals

RNA, Messenger

Word Cloud

Created with Highcharts 10.0.0genesoverdispersiondatacountDEHOGTexpressedDEreadanalysisnumberconditionsmethodsRNA-seqheterogeneousinformationfragmentsgenedifferentiallydetectbasedexistingpowerlimitedsamplecountstestingmodelingmicroglialcellsmRNA-seqpowerfultechnologyinferringbiologicalsystemsinterestSpecificallysequencedRNAalignedgenomicreferencesequencessequencecorrespondingconditionidentifieddifferencenumbersstatisticallysignificantSeveralstatisticaldevelopedHoweversufferdecreasingidentifyarisingsizerefersempiricalphenomenonvariancelargermeanproposenewdifferentialexpressionprocedure:post-hocinferenceprocedureintegratesprovidesflexibleadaptiveadoptsgene-wiseestimationschemeenhancedetectionreplicateslonglargetestedsyntheticoutperformstwopopularDESeq2EdgeRdetectingapplyproposedmethodtestdatasetusingRNAseqtendsdifferentlypotentiallyrelateddifferentstresshormonestreatmentsDifferentially

Similar Articles

Cited By

No available data.